tithe t H i : filststetyh He Artie 
Gate venltalyl ‘ aay alata Seren Manelatat teh 
aid aH a Sitters 


Heatarne 
fitetyty 
A} 


Fatale 
Yflctie 


Hats 
Ppa 


feveiee 


panna 

Hessen hae: 

mopesenten Baie 

ak 

Saye Me ie los i SO a 

yee “4 3 Ri e Un se HG i i Baty 
As Asia? "i re 

SRA See oh sal Mature at tee 


if 
Ss 
ne 
oy 


ist a a tipo thle] 
ri feapase neu ’ ; : MN NOR ath SCN RAG eC : ¢ i iiauisle wnat ig 
asthols : Alias 


ae 


ton 
Ponti 
one 


Bie 


ors 
ar 


Kean! 
Ne 
aah 
sat rahe 
mie 


nS 
aa 


WILEY SERIES IN PROBABILITY 
AND MATHEMATICAL STATISTICS 


ESTABLISHED BY WALTER A. SHEWHART AND SAMUEL S, WILKS 
Editors 


Vic Barnett, Ralph A. Bradley, J. Stuart Hunter, 
David G. Kendall, Rupert G. Miller, Jr., Adrian F. M. Smith, 
Stephen M. Stigler, Geoffrey S. Watson 


Probability and Mathematical Statistics 


ADLER - The Geometry of Random Fields 

ANDERSON - The Statistical Analysis of Time Series 

ANDERSON - An Introduction to Multivariate Statistical Analysis, 
Second Edition 

ARNOLD - The Theory of Linear Models and Multivariate Analysis 

BARNETT - Comparative Statistical Inference, Second Edition 

BHATTACHARYYA and JOHNSON : Statistical Concepts and 
Methods 

BILLINGSLEY - Probability and Measure, Second Hdition 

BOLLEN : Structural Equations with Latent Variables 

BOROVKOV - Asymptotic Methods in Queuing Theory 

BOSE and MANVKL : Introduction to Combinatorial Theory 

CAINES - Linear Stochastic Systems 

CASSEL, SARNDAL, and WRETMAN - Foundations of Inference in 
Survey Sampling 

CHEN - Recursive Estimation and Control for Stochastic Systems 

COCHRAN - Contributions to Statistics 

COCHRAN : Planning and Analysis of Observational Studies 

CONSTANTINE - Combinatorial Theory and Statistical Design 

DOOB : Stochastic Processes 

DUDEWICZ and MISHRA - Modern Mathematical Statistics 

EATON - Multivariate Statistics: A Vector Space Approach 

ETHIER and KURTZ - Markov Processes: Characterization and 
Convergence 

FABIAN and HANNAN : Introduction to Probability and 
Mathematical Statistics 

FELLER - An Introduction to Probability Theory and Its 
Applications, Volume I, Third Edition, Revised; Volume IT, 
Second Edition 

FULLER : Introduction to Statistical Time Series 

FULLER - Measurement Error Models 

GRENANDER - Abstract Inference 

GUTTMAN - Linear Models: An Introduction 

HALD - A History of Probability and Statistics and Their 
Applications before 1750 

HALL - Introduction to The Theory of Coverage Processes 

HAMPEL, RONCHETTI, ROUSSEEUW, and STAHEL - Robust 
Statistics: The Approach Based on Influence Functions 

HANNAN - Multiple Time Series 

HANNAN and DEISTLER - The Statistical Theory of LinearSystems 

HARRISON - Brownian Motion and Stochastic Flow Systems 

HETTMANSPERGER - Statistical Inference Based on Ranks 

HOHKL - Introduction to Mathematical Statistics, Fifth Hdition 


HUBER - RobygniveRSiTy OF 
ILLINOIS LIBRARY 
AT URBANA-CHAMPAIGN 
MATHEMATICS 


Probability and Mathematical Statistics (Continued) 


IMAN and CONOVER - A Modern Approach to Statistiogy; DRAWN 

IOSIFESCU - Finite Markov Processes and Applications University at 

JOHNSON and BHATTACHARYYA - Statistics: Pringiples; aitntary 
Methods, Revised Printing at Urbana-Champaign 

LAHA and ROHATGI - Probability Theory 

LARSON : Introduction to Probability Theory and Statistical 
Inference, Third Edition 

LEHMANN .: Testing Statistical Hypotheses, Second Edition 

LEHMANN - Theory of Point Estimation 

MATTHES, KERSTAN, and MECKE . Infinitely Divisible Point 
Processes 

MUIRHEAD : Aspects of Multivariate Statistical Theory 

PRESS - Bayesian Statistics 

PURI and SEN - Nonparametric Methods in General Linear Models 

PURI and SEN - Nonparametric Methods in Multivariate Analysis 

PURI, VILAPLANA, and WERTZ - New Perspectives in Theoretical 
and Applied Statistics 

RANDLES and WOLFE : Introduction to the Theory of 
Nonparametric Statistics 

RAO - Linear Statistical Inference and Its Applications, 
Second Edition 

RAO - Real and Stochastic Analysis 

RAO and SEDRANSK - W.G. Cochran’s Impact on Statistics 

RAO - Asymptotic Theory of Statistical Inference 

ROBERTSON, WRIGHT and DYKSTRA .- Order Restricted 
Statistical Inference 

ROGERS and WILLIAMS .- Diffusions, Markov Processes, and 
Martingales, Volume II: Ito Calculus 

ROHATGI - An Introduction to Probability Theory and 
Mathematical Statistics 

ROHATGI . Statistical Inference 

ROSS - Stochastic Processes 

RUBINSTEIN - Simulation and The Monte Carlo Method 

RUZSA and SZEKELY .- Algebraic Probability Theory 

SCHEFFE - The Analysis of Variance — 

SEBER - Linear Regression Analysis 

SEBER - Multivariate Observations 

SEBER and WILD - Nonlinear Regression 

SEN - Sequential Nonparametrics: Invariance Principles and 
Statistical Inference 

SERFLING - Approximation Theorems of Mathematical Statistics 

SHORACK and WELLNER - Empirical Processes with Applications 
to Statistics 

STOYANOV - Counterexamples in Probability 


Applied Probability and Statistics 


ABRAHAM and LEDOLTER §-: Statistical Methods for Forecasting 

AGRESTI - Analysis of Ordinal Categorical Data 

AICKIN - Linear Statistical Analysis of Discrete Data 

ANDERSON and LOYNES - The Teaching of Practical Statistics 

ANDERSON, AUQUIER, HAUCK, OAKES, VANDAELE, and 
WEISBERG - Statistical Methods for Comparative Studies 

ARTHANARI and DODGE - Mathematical Programming in 
Statistics 


Applied Probability and Statistics (Continued) 


ASMUSSEN - Applied Probability and Queues 

BAILEY - The Elements of Stochastic Processes with Applications 
to the Natural Sciences 

BARNETT :- Interpreting Multivariate Data 

BARNETT and LEWIS - Outliers in Statistical Data, Second Edition 

BARTHOLOMEW - Stochastic Models for Social Processes, 
Third Edition 

BARTHOLOMEW and FORBES - Statistical Techniques for 
Manpower Planning 

BATES and WATTS - Nonlinear Regression Analysis and Its 
Applications 

BECK and ARNOLD .- Parameter Estimation in Engineering and 
Science 

BELSLEY, KUH, and WELSCH : Regression Diagnostics : Identifying 
Influential Data and Sources of Collinearity 

BHAT - Elements of Applied Stochastic Processes, Second Edition 

BLOOMFIELD - Fourier Analysis of Time Series: An Introduction 

BOLLEN - Structural Equations with Latent Variables 

BOX -R. A. Fisher, The Life of a Scientist 

BOX and DRAPER - Empirical Model-Building and Response 
Surfaces 

BOX and DRAPER - Evolutionary Operation: A Statistical Method 
for Process Improvement 

BOX, HUNTER, and HUNTER : Statistics for Experimenters: An 
Introduction to Design, Data Analysis, and Model Building 

BROWN and HOLLANDER : Statistics: A Biomedical Introduction 

BUNKE and BUNKE : Statistical Inference in Linear Models 

BUNKE and BUNKE - Nonlinear Regression, Functional Relations 
and Robust Methods: Statistical Methods of Model Building 

CHAMBERS - Computational Methods for Data Analysis 

CHATTERJEE and HADI - Sensitivity Analysis in Linear Regression 

CHATTERJEE and PRICE - Regression Analysis by Example 

CHOW - Econometric Analysis by Control Methods 

CLARKE and DISNEY - Probability and Random Processes: A 
First Course with Applications, Second Edition 

COCHRAN - Sampling Techniques, Third Edition 

COCHRAN and COX - Experimental Designs, Second Edition 

CONOVER - Practical Nonparametric Statistics, Second Hdition 

CONOVER and IMAN :- Introduction to Modern Business Statistics 

CORNELL - Experiments with Mixtures: Designs, Models and The 
Analysis of Mixture Data 

COX - Planning of Experiments 

COX - A Handbook of Introductory Statistical Methods 

DANIEL - Biostatistics: A Foundation for Analysis in the Health 
Sciences, Fourth Edition 

DANIEL - Applications of Statistics to Industrial Experimentation 

DANIEL and WOOD : Fitting Equations to Data: Computer Analysis 
of Multifactor Data, Second Hdition 

DAVID - Order Statistics, Second Edition 

DAVISON - Multidimensional Scaling 

DEGROOT, FIENBERG and KADANE - Statistics and the Law 

DEMING - Sample Design in Business Research 


(continued on back) 


Nonlinear Regression, 
Functional Relations and Robust Methods 


NOTICE: Return or renew all Library Materials! The Minimum Fee for 
each Lost Book is $50.00. 


The person charging this material is responsible for 
its return to the library from which it was withdrawn 
on or before the Latest Date stamped below. 

Theft, mutilation, and underlining of books are reasons for discipli- 


nary action and may result in dismissal from the University. 
To renew call Telephone Center, 333-8400 


UNIVERSITY OF ILLINOIS LIBRARY AT URBANA-CHAMPAIGN 


MAY 2g 1992 
JUL 24 RED 


APR 23 peep 


L161—O-1096 


Digitized by the Internet Archive 
in 2023 with funding from 
Kahle/Austin Foundation 


https://archive.org/details/statisticalmethoOO02unse_j3n5 


ll a at lal hf — ae 


Nonlinear Regression, 
Functional Relations and 


Robust Methods 


Statistical Methods of Model Building, 
Volume II 


Edited by 
Helga Bunke 
Academy of Sciences of the GDR 


and 
Olaf Bunke 
Humboldt University, Berlin, GDR 


Translated by the authors 


John Wiley & Sons 
Chichester - New York - Brisbane - Toronto - Singapore 


Translation with slight modification of 


K. M.S. Humak, Statistische Methoden der Modellbildung, Band II, Nichtlineare 
Regression, robuste Verfahren in linearen Modellen, Modelle mit Fehlern in den Variablen, 
published by Akademie-Verlag Berlin in the series Mathematische Lehrbiicher und 


Monographien 


Copyright © 1989 by Akademie-Verlag Berlin, GDR/John Wiley & Sons Ltd. 


All rights reserved. 


No part of this book may be reproduced by any means, or transmitted, or translated 


into a machine language without the written permission of the publisher. 


Library of Congress Cataloging -in-Publication Data: 
Statistical methods of model building. 


Wiley series in probability and mathematical statistics) 
Translation of: Statistische Methoden der Modellbildung. 
Includes bibliographies and indexes, 

Contents: v. 2. Nonlinear regression, 

functional relations, and robust methods/edited by 
Helga Bunke and Olaf Bunke. 

1. Mathematical statistics. I. Bunke, Helga. 

II. Bunke, Olaf. III. Title. IV. Series. 

QA 276.878713 1987 519.5 86-15951 

ISBN 0 471 91239 5 (v. 2) 


British Library Cataloguing in Publication Data: 


Humak, K. M.S. 
Nonlinear regression, functional relations and robust methods: 
statistical methods of model building, volume IT. 
1. Mathematical statistics 2. Mathematical models 
I. Title II. Bunke, Helga III. Bunke, Olaf 
IV. Statistische Methoden der Modellbildung. 
Bd. 2: Nichtlineare Regression, robuste Verfahren in linearen Modellen, 
Modelle mit Fehlern in den Variablen. Lnglish 
519.5 QA276 
ISBN 0 471 91239 5 


Printed in the German Democrati¢ Republic 


The author’s name (K. M.S. Humak) in the German edition is a pseudonym 
for ‘Kollektiv Mathematische Statistik: Humboldt-Universitat zu Berlin und 
Akademie der Wissenschaften der DDR’ (Collective of Mathematical Statisti- 
cians at the Humboldt University of Berlin and the Academy of Sciences of 
the GDR). J. Jureckovd of Charles University of Prague is the author of Chapter 2 
on robust inference. The English translation of this volume has been per- 
formed by the authors. 

This collective was headed by Helga Bunke and Olaf Bunke, who also hold 
overall responsibility for the publication. The authors of the individual chap- 
ters and appendices were as follows: 


Chap. 1 £1 H. Bunke, W. H. Schmidt 
1.2 . U. Schulze 
1.3 M. Nussbaum 
Chap. 2 J. Jureckova 
Chap. 3 3.1—3.3.3.3 H.-P. Héschel 
3.3.4—3.4 M. Nussbaum 
3.5—3.9 H.-P. Hoschel 
Appendices Al M. Nussbaum 
A2—A3 W. H. Schmidt 


Apart from the authors, B. Grabowski, K.Henschke, B. Seifert, and R. 
Striiby helped with the editing of the German edition; K. Henschke supervised 
the editorial work. R. Teufel prepared large portions of the manuscripts of 
the German and English editions. B. Droge supervised and performed large 
portions of the editorial work for the English edition. 

Special thanks are also due to R. Héppner of Akademie-Verlag for his 
assistance with editing. 


Preface 


The theory of models, especially methods of linear regression and the analysis 
of variance, play a central role in the statistical analysis of experimental data 
and in modelling causal relationships. This theory has been developed and 
treated in innumerable papers and monographs, and the book ‘Statistical 
Interference in Linear Models’ by the same editors, H. Bunke and O. Bunke, 
provides a comprehensive presentation. But, in many problems of statistical 
analysis and modelling these methods are not sufficient. Extended models as 
well as more general or different methods are needed. For instance, this is the 
case when only nonlinear regression functions give a sufficient description of 
causal relations or if the causes, the explanatory variables or regressors can 
only observed with errors. Sometimes also irregularities of the observation 
errors, like the existence of outliers, call for the use of so-called ‘robust’ me- 
thods. The treatment of such problems in general demands a considerably 
higher numerical and computational effort. But increased computational 
capabilities offered to the statistician new possibilities for dealing with such 
more complicated problems. From this emanated a vigorous impetus for 
further development of the statistical theory. A lot of theoretical and applied 
research on nonlinear regression analysis, on functional or structural relations 
and on nonparametric and robust estimations emerged in this connection. 
The authors hope to provide with this monograph a comprehensive and as 
unified as possible presentation of the state of the art in these fields. While 
single topics are treated in a series of books like those by M. G. Kendall and 
A. Stuart, HE. Malinvaud, F. Schmidt, D. A. Ratkowski and Y. Bard, and while 
there exist important survey papers for some fields, we are not aware of any 
comparable presentation of these fields. Essential parts of the book are cha- 
racterized by results of the authors’ own research, some of them unpublished 
until now. 

The book is addressed to statisticians in research, teaching, and applica- 
tions and to mathematicians who want to be informed on the fields mentioned 
above, i.e. on statistical inference, on parameters of nonlinear regression func- 
tions, on models with errors in the variables, or on robust methods for regression 
parameters. The reader should have a basic knowledge of probability theory 


8 Preface 


and mathematical statistics, especially of regression analysis. The book is 
designed in such a way that it can be used independently of the book ‘Statisti- 
cal Inference in Linear Models’. The different problems and results are pre- 
sented with mathematical vigour and in a systematic way, which is intended 
to be as unified as possible. Thereby the different chapters are coordinated on 
the one hand, but independently readable on the other. In order not to exceed 
the already considerable size of the book, some results are discussed without 
giving proofs. Similarly, as in ‘Statistical Inference in Linear Models’ the 
theorems and auxiliary results of linear algebra, probability theory and sta- 
tistics, which are needed in the proofs, are included in an appendix in order 
to minimize the length of the proofs and to concentrate on the specific aspects 
-of the considered fields. Some of the results from the appendix are new and 
have been derived exclusively for the solution of the problems investigated. 
In references to results on inference for linear models, the corresponding sec- 
tions or results from ‘Statistical Inference in Linear Models’ are mentioned. 

In the following description of the content we cannot, in view of the size 
of the book, give a complete account. We want to indicate the essential orien- 
tation of the chapters, new results, and some very recent results of the basic 
literature which have been included. Many of the results have been obtained 
by the authors, a fact which will not always be explicitly stated. 

Chapter 1 is devoted to the estimation of parameters of nonlinear regression 
functions and to the testing of corresponding hypotheses. At first the question 
is discussed of why the approximation by linear models or a transformation of 
nonlinear models to such models only sometimes gives a satisfactory solution. 
The main objects of the investigations are the weighted least squares estima- 
tion and the maximum likelihood estimation of regression parameters. Thereby 
an important extension of the usual approach hitherts is taken as a basis, name- 
ly avoiding the assumptions of adequacy of the regression function and of 
homogeneous variances. Thus the basic asymptotic properties of the weighted 
least squares estimator proved by &. J. Jennrich and H. Malinvaud, like con- 
sistency and asymptotic normality, are generalized under a model without 
assumption of a normal distribution, and an asymptotic analogue to the Gauss- 
Markov theorem is proved. Under the assumption of a normal distribution, 
stronger optimality properties (BAN property) are shown, which also charac- 
terize the normal distribution in the case of identical error distributions. 
Corresponding results are also derived for the maximum likelihood estimator 
under more general assumptions on the error distribution. A similar asymp- 
totic theory is also derived for the residual estimations of the variance. Asymp- 
totic tests for testing hypotheses on regression parameters based on the least 
squares estimator and on the likelihood ratio statistics are investigated and 
their asymptotic power under local alternatives is given. Confidence regions 
are surveyed. 

A separate extensive section deals with models with changes of state, for 
which different regression functions hold for certain subsets of observations. 


Preface 9 
ieee ee neo re Scere erie Peete he et Vee 
Models with abrupt and with continuous changes of state are considered. For 
the estimation of the change points and of the regression parameters we in- 
vestigate suitable methods, especially least squares estimators, as well as their 
properties, such as consistency. We discuss in which way tests and methods of 
cluster analysis may be used for decisions on the presence of change points. 
The section gives a survey of the literature on models with changes of 
state. 

A further section is devoted to the asymptotic optimality of nonparametric 
estimators of regression functions. Optimality is considered in the local asymp- 
totic minimax sense connected with the work of Ibragimov and Khasminski. 
Their results together with those of Stone and other on lower bounds for the 
asymptotic minimax risk and on functions attaining these bounds are reviewed. 
Moreover the role of least squares splines and classical smoothing splines as 
asymptotically optimal estimators and the corresponding L,-risk convergence 
order is discussed including results of Agarwal, Studden, Cox and others. Exact 
constants for the optimal convergence and corresponding estimators connected. 
with interpolating splines are derived based on results of Pinsker on observa- 
tions following a linear stochastic differential equation. 

In Chapter 2 we develop the theory of robust methods for inference on linear 
parameters in linear models with independent identically and continuously 
distributed errors. Following an introductory discussion of robustness, the 
treatment is mainly on asymptotic properties of L-estimators, which are based 
on linear combinations of order statistics, of R-estimators, which are derived 
from rank tests or rank-dependent criteria, and of M-estimators, which are 
computed by generalizations of the least-squares criterion. For the special 
case of the location model a finite sample minimax property proved by P. 
Huber is shown for the M-estimators which is defined by Huber’s wy-function. 
Asymptotic normality is shown under certain regularity assumptions for M- 
estimators of linear parameters. According to a theorem by P. Huber, it turns 
out that Huber’s M-estimator has an asymptotic minimax property. This pro- 
perty also holds for the L- and R-estimators, which are asymptotically equi- 
valent to Huber’s M-estimator. For the asymptotic equivalence of M-, L-, and 
R-estimators we give equations between the corresponding weighting and 
score-generating functions. Numerical algorithms for the computation of M- 
estimators are extensively discussed following ideas of P. Huber and R. Dutter. 
As a basis for rank methods we first derive the known locally most powerful 
rank tests for the hypothesis of a vanishing regression part and for the hypo- 
thesis of the symmetry. “or the corresponding linear rank statistics and signed 
rank statistics, respectively, the asymptotic normality under the null hypo- 
thesis is shown following J. Hajek. 

Then we prove the uniform asymptotic linearity in the regression para- 
meters of the linear rank statistics and we give the thoerem of C. van Heden 
for the signed rank statistics. By means of this property we can obtain the 
asymptotic normality of R-estimators, which also allows the derivation of the 


10 Preface 


asymptotic efficiency under various distribution assumptions. The asymptotic 
normality of a linearized version of rank estimations is also studied along the 
lines of C. van Eeden. Since the asymptotically efficient R-estimators depend 
on the unknown density, three asymptotically efficient adaptive methods are 
introduced, which were investigated by J. Hajek, R. Beran, and C. van Eeden. 
Asymptotic confidence intervals for one-dimensional regression coefficients 
are constructed from rank tests. We show that the ratio of the lengths of the 
standard and the rank confidence intervals converges against the Pitman 
asymptotic relative efficiency of the standard and rank tests. A sequential 
confidence interval with given length, which has been given by F. J. Anscombe, 
J. Geertsema and M. Gosh and P. K. Sen and wihch is based on the Wilcoxon 
test, is also discussed. 

Chapter 3 gives an introduction to the topic of models with errors-in-varia- 
bles as well as a comprehensive presentation of the classical and recent results. 
After the discussion of simple examples from applications, we explain why the 
least squares estimators from regression models may be bad, if there are 
errors in the variables. 

General model formulations are exhaustively discussed. Within a survey on 
identifiability statements we give, among other things, the theorems by Rezersol 
and the result which states that the structural parameter is not consistently 
estimable in a model with nonrandom experimental design, if it is not identi- 
fiable in a corresponding model with random design. Maximum likelihood esti- 
mators are considered first of all for bivariate linear functional relations. Doing 
this, according to NV. R. Cox and G. R. Dolby, models with nonrandom as well 
as those with normally distributed random experimental design may be con- 
sidered simultaneously. Then we deal with multivariate models with nonrandom 
experimental design. The relation between maximum likelihood and least 
squares estimators is described. Under the assumption of independent mea- 
surement errors the maximum likelihood estimator may be obtained from an 
eigenvalue problem. Besides this known result we derive statements on equi- 
variance and uniqueness. For models with a covariance matrix, which is known 
up to an unknown factor, the coordinate-free approach allows a condensed 
unified presentation of some known results. The theorem by 7’. W. Anderson 
on the maximum likelihood estimator under normally distributed independent 
errors with unknown covariance, which has been overlooked for a long time, 
leads to the solution of an eigenvalue problem. For nonlinear models we outline 
possibilities for simplification on the basis of special assumptions on error 
covariance of the errors and identifiability properties. For nonlinear models 
with replications of a fixed experimental design we show, as in Chapter 1, 
the consistency, asymptotic normality, and optimality of the weighted least 
squares and the maximum likelihood estimator, respectively. 

The asymptotic normality is also shown for a modified Gauss-Newton ite- 
ration suggested by W. A. Fuller and K. M. Wolter. Explicit formulas and 
estimates for the asymptotic variances and covariances of the generalized 


Preface / 11 
ee eee ee ee ON ee 
least squares estimators are given in the bivariate case. As alternatives to 
maximum likelihood estimators we consider, among others, instrumental 
variables estimators. The relations between known estimators for parameters 
on functional relations and in simultaneous equations of econometry are clari- 
fied following ideas of 7. W. Anderson and are connected with an approxima- 
tive power comparison of the estimators. In the same manner we compare the 
modified maximum likelihood estimators and the two-stage least squares 
estimators investigated by W. A. Fuller. For linear models in which measure- 
ment errors and design points are generated by time series we treat consistency, 
asymptotic nomality, and identifiability for an estimator, introduced by P. M. 
Robinson. 

A separate extensive section is devoted to a uniform asymptotic theory of 
linear models with nonrandom experimental design. Following the explanation 
of the parametrization and a collection of results on maximum likelihood 
estimators, we develop a general formulation of ‘canonical’ variables estima- 
tors, as a formal special case of which the maximum likelihood estimator 
arises. The consistency of such estimators is poved under certain assumptions, 
which are additionally interpreted. Some special cases known from the litera- 
ture are discussed. Now, the asymptotic efficiency of the maximum likelihood 
estimator cannot be proved because of the infinite-dimensional parameter 
space. Hence assuming a normal distribution, we prove a limit normal distri- 
bution and, in connection with this, the efficiency in a certain heuristically 
motivated class of estimators. This class contains besides the maximum likeli- 
hood estimator the most important alternative estimators investigated in the 
literature. Moreover, from a method of improvement there results an easily 
computable efficient estimator. Many known results on limit distributions 
and comparisons follow from the general theorems. 3 

Based on the results of T. W. Anderson we give a survey on tests and con- 
fidence regions for linear models. Finally we describe possibilities for the nu- 
merical calculation of weighted least squares estimators. For bivariate linear 
models with different but known covariances for each single design points we 
describe a method of J. H. Williamson. A Newton-Raphson type method is 
discussed following M. O’Neill and L. G. Sinclair for bivariate polynomial 
models. The special structure of the Gauss-Newton methods is discussed for 
general errors-in-variables models and other models. 


Helga Bunke Olaf Bunke 


Contents 


Chapter 1 Parameter estimation and testing hypotheses in nonlinear models 


ileal Parameter estimation in nonlinear models ..............0000000ceeee 
1.1.1 APART CUGEL ear oh ops cee st mE Mare RTI tess td inven ae porleaasanes Boek 
1.1.2 PROMOS es at esi te ons cco BENM MOET. Ao ces 21h dilavicen saul vse tee at ike 
1.1.3 ECAP AGUASLOS CSUIMALION 6 ols feet husaree Git rhea ah wae soe wae eee 
1.1.4 ELC AES OO arate Sate re te EE se ed ne Were Gass oe gs 
isthe Consistency and asymptotic distribution of the least squares estimation 


1.1.5.1 Introduction 
1.1.5.2 The model and assumptions 
1.1.5.3 Consistency 
1.1.5.4 Further assumptions 


Pl comm ASV MD LOLICLCISbIEOUCLONS eaten a lemaire es < 4 octe hare cisions ae ca cas einen eine 
iO: Gumopecialucases andirelated results 2 aac eaces reece sien cnnciee oe oe 
1.1.6 ANSI ty OOOO) ON TUANEY WAAL, 3 hem came Bead Bias g 6 Gremic cartons cacke eens bin orcad on 
Uy lag Asymptotic results for estimators and tests of the variance ........... 
1.1.8 Tests and confidence regions for regression coefficients ............... 
1.2 Swalbe hin care cress iom) MOCelsiyse rela cisse eet aac ae eee cteneie lev ea eee a eee 
1.2.1 LB MTR OYe NTC TON As Sra, 5 RRC: Ger carta Ge Eire ae ic eee MEI Ot ORE ene och 
1.2.2 Ordered models with abrupt state switching ........................ 
Pepa. OM BITRE Ss igo oto bri AG Oh eos OM Bn eee ee BAI I Sh oes Sea at 
We2e sees OAS HIS UALOSs OS bIMMA LOLS assere ays r te scsla) loci em alesis seeceensenetts, © lens 
12.2.0 esting the presence ofa state switching .. 27... .6%. 6.5.5 c ee es eee 
12.3 Models with continuous state switching................... 02s ee eeee 
iLmees i | WM aYeNrayoye el eo ierhe Siar. oko ed cline Gites Ook Geic Ica ca ne EIS Or Sea ienein Bimnerscn oerarer 
Ie ro CLC ASOSCUALES CS LIM A LOLS a. ys aus oa celeste) es eeyodeno oie Mig taderundanl ot a) Yyake 
eee ME SOME SLCSULD OM PTOD LOM mer ape iaes labs wegen psy heh oer ctonk Tosedeyn ts agshaletas avo tarey lore he 
1.2.4 Some asymptotic results on least squares estimators in ordered models 

WANA) CUR O EN AIRE NIN? | 6 clan tosis MO RL Go bicak AoelereOLOe CaO DI GIN c.0id.o 00 COU 
1.2.5 Some other models with state switching ..................eseeeeeees 
1.2.6 Methods of identification of state switching in models with unknown 

PLN) AXE? CP CMEU ES nn O hola ae Srulio eo rao oticinniteD Apo o ot aad +s dorian 
1.3 Some topics in nonparametric regression .........5--2+.-teeeseeeees 
1.3.1 ibaa WKN, 4. Bobo colo Aho eis 6b OlD Coe aids ccio.c cic o eo Erarome 
1.3.2 Optimal rates of convergence. 2... 2... f eee ee ee eee ee 
1.3.3 SSP ULEA PSOE 20% 0 SLUG Vee Pe tones 2 a be Arse ee acca rey Ciclo ge eens Para ae ae 
1.3.4 Optimal rates and exact constant ............-. see cece eee eens 


1.3.5 Si MIN? WOOO awoondoapassoodpoo op saeoodOn dy ouGmcpuotcenenn 


14 Contents 

1.4 ReFEREI COGS acy ave ec See ERM ea RR RAR era eee Re Pee ae aett EE See et MER CORSE 125 
1.4.1 ‘References for Section. Latics sroronile a fiepene Mole Fe, Sisto sa ena ren eects ene 125 
1.4.2 References for: Section leds ccic% silence a gicievetetokeker nein Saeco era one eee 128 
1.4.3 References for: Section 1.3 st.ctecsauccs farsanc oaks costae te eae 130 


Chapter 2 Robust statistical inference in linear models 


2.1 


2.8 


3.1 
3.1.1 
3.1.2 
3.1.3 
3.1.3.1 
3.1.3.2 
3.1.3.3 


3.1.3.4 
3.1.3.5 
3.1.3.6 
3.1.4 


General remarks ONTO DUSURCSS Merce shel oie chek stay onatelelferaser cn ateretcrs sicielne cee ers 134 
Robust alternatives to the method of least squares..................- 136 
Det sin verte ro} e: Manco ae RODEN eis GeO OOO moma Gd CaS eRe ans Gro ua AS chic 137 
IMGs bimia Fore: ae, 1: Fis c sor cists yotes eterna velioss oreravoiste Oirodesecnslcaaoieestacere arian aia 138 
Reestimabors sci iaters cee va Pae ears ion then tedes eee eet horace ay | aoa g Tere Re 139 
12} TS ANTI AOHIUEC SHAE ANIT US So Shona goo HOA nome GeoU Sod omude Pom eed 141 
Finite sample minimax properties of M-estimators in the location model 142 
Altermatiye choice: of themp-funCtiomys c-serscc rks re ole opel ate s coeleieiear siete 146 
Computational aspects and numerical algorithms .................... 148 
Asymptotic properties of M-estimators .......5.....52.2.-...0...:8 152 
Asymptotic normality of M-estimators. 22.5... 02.5. see eee es won 152 
Asymptotic minimax properties of M-, R-, and L-estimators .......... 160 
Somespropertiesioi&wankitests Uayrpise an creer. ciouraie Kotter ek 164 
Hhocallyanostipowerlalrank tester te. teres sachets a nleeret ieee oe 166 
Asymptotic behaviour of rank and signed-rank test statistics ......... 170 
Estimators of regression coefficients based on rank tests ............. 178 
Asymptotic normality of R-estimators © 2s. pase «sick eiecis <> scl e 179 
Linearized rank estimators and their asymptotic distribution ......... 185 
Adaptiverran ky estima tors mace atansn Ae cticiriciy poate hire ere 187 
Asymptotic comparison of different estimation procedures ............ 191 
Asymptotic distribution of the difference of M- and R-estimators ..... 191 
Asymptotic distribution of the difference of linearized rank estimator 

BNA OStlmMaAtOoree res oki lanere rise le aes el epee rate ane tee eee 193 
Confidence intervals for regression coefficients based on ranks ........ 194 
Asymptotic efficiency of rank confidence intervals .................. 196 
Bounded length confidence intervals based on the Wilcoxon test....... 199 
References no ,2 cana atte ei stoieeher tain ee cis Geol ans iele} ce eae eee 202 


Modelsewith:errors=in-varla blOs mp inieie theta <i doictere | cle een ates Secreta Se 214 
Functional and structural relations — an introduction ................ 214 
Comparison) with regressiomumod ClSi.pperua. shear terre ceretstel eeenenen ene 217 
Models*withyerrorssin=varia blester mai. cis vertical cereseinieseae cance anette aoe 220 
The fundamentalsmodelye saath cus tene skate ee cee a ae Ni ee 220 
Minear cunchional melavions acces certrcn Corie sea eiels seperate eee 225 
Linear functional relations with fixed experimental design and with linear 

rOsressiow parh*. seas Fa ee eek eens haw OR Eee ee eal 
General models with errors-in-variables ....... 0.0... 0.00. 0e0cuvvase. 228 
Regression modelels 5 aliton cry eee Se ele ne RR ee 229 
Functional relations with random experimental design ............... 231 


Identifiability. 25; cice Se eae one ee iso Ee Rae e 232 


Contents 15 


a ee 


3.4.1 


On the existence of consistent estimators under a nonrandom experimen- 
UIST 2000 he icaeh gary oe ean ar GOAN a OR 10s 0 ns a 238 


BuO eT ANU e OMNIS EES Me eMC ar iets sf O8 Li PRE ae Meo is 240 
MMe xisnum Iikelihood estimators: <a icedcciaiass <isnesisagrclenin a Gop Pes y= 247 
Divariace linear funetional relations. 5... 2. secs aiseisaw sicc vite once 247 
aE SCUICE AT MOU OR Me Sa fics Rn eee AYe falda bis. eis ne » abies nln whew Oe 247 
Reported ODSEIVatONe Suse 55% nes saa ky nbd oe Oey hee 249 
Observations without replications.............0.. 0.00 cece eee cece ees 253 
Maximum likelihood and least squares estimators ................... 255 
Estimation procedures for models with errors-in-variables............. 255 
Maximum likelihood estimation-#.: 00.024 Ws Pet ee a a 256 
Least squares estimation nite Used, be te JA Pele YH 256 
Measurabilitgand uniqueness Gace ry. SA eis Rea ee Ss 258 
Linear functional relations with nonrandom experimental design and 
EON COW OTIANOOE cap chs). ta. toas Sere Hae FAW ean Wiese ae NIRS oy 258 
PEK GSMO Clmasrag haat traci geeh ticker ee eee: Shee A eM Bas Sa ee 258 
Heastisquaresiestimalionieas a se cee verckote teens cate ah OT RS do 259 
EGUIVaTIONEO: s. CLAMS tcl aos: Rea eeatt Noe Os VOR ee SEO cane 263 
Linear functional relations with nonrandom nonobservable variables and 
TINOCRE TEGTESSION PAL eileen jagle sans ae Oe ee ak OU LO” Ret tran oc 264 
Linear functional relations with nonrandom experimental design and co- 
Varlancelknowmrup: tora lactone te ele ack etal ticles mie ators aed et 264 
Linear functional relations with nonrandom experimental design under 
independent normally distributed errors .............¢....---+2-+-:-- 266 
» Nonlinear models with known error covariance .................-45. 273 


Models with unknown error covariance under normally distributed errors 277 


horbheres timea vlonsprocedures gee sciee ie cs sietsos eins iene eee sh 278 
Linear functional relations with independent errors ................-- 279 
PTI TROCUCLIOIg pemeiic tourette eerie) ee nO aT eR a Ne OM ee eee ee Gee kot 279 
Ordinary and orthogonal least squares estimation .................-. 279 
dave tramrenta leva via bles cy.re¥-ccrtert hear ary sietxets: wie Oe eee soe ooh eee eeirae Sete suas 280 
UWserof-variance:components ean seta ae hie cs oe «Rie tele ehecae ee sie 283 
Limited-information maximum likelihood and two-stage least squares 

CUOMO) 5 6 dig Ripe po ad SOS. Son. Be Riopermon aap ee Get coleen Ol 283 
Modified maximum likelihood estimation ...............-.0e eee eeee 285 
Linear functional relations with dependent errors ................+.- 287 
Nonlinear models with independent errors ................-.++-+--:- 291 
Modified Jeast’squares estimation 2.00... ese ecw ee tase ee 291 
Differenticovalian Cosmin. eit estriol ciao cy tennessee tere alegan ss 294 
WUnknowniditferent covariances) ..g - et a 2 ojos cis ise ayeleieiel orate wiciel lt 294 
Estimation with instrumental variables in linear functional relations ... 295 
VTE YO MIKE GIN + Stig G.c.cics Pa Cyne eT Oo GO On rane o UEP OE a> IB 295 
A general model for linear functional relations with nonrandom unobser- 

WANS WOME IES 4. on eee ores) Se con one Deon nies Seo. aa9!a oar 296 
Maximum likelihood estimation in linear functional relations with non- 

random unobservable variables ............0.2+ceccec seers en etess 303 
Estimation using instrumental variables in linear functional relations .. 309 
Estimation using instrumental variables in linear functional relations 

with nonrandom unobservable variables .......... sees eee cece eee 315 
Asymptotic theory for linear functional relations with nonrandom unob- 

servable variables and with independent errors ..........--+++++-++: 316 
IGAEOVS WOR CYN 4 ocdnobo cubed Sop L RON F000 0 Cr tIC OREO Ole OncuDiCe: Ora oma 316 


16 Contents 
3.4.2 Consisten Gy. qecacisic os soiree bt tnere ei cere eh aisvohs te etepeneceneied telteereRepenyers ets 325 
3.4.3 1Dp- akin) creme crepe OG orc oe HON Od nid obo aiden oto cn ocoo do al 80 332 
3.4.4 Asymptotic normality under normal distribution ...................- 336 
3.4.5 NEV TNS Gauge. ec hdode moors osroounnopenodaccooddGHOC OOD 343 
3.4.6 Mhexrseneral MonnormaliGases je aiede eee ors ele ear te kene tenet tat Rat 356 
3.4.7 inaliremarks, 2654s: caus cep se ae meetin aca rei one cnt ekasere in tae, sreenees eke 360 
3.5 Specialrasymptotics e206 ak wetness oe eae toner opie cena ten aks 362 
3.0.1 Asymptotics under fixed experimental design ...............+2--000- 363 
S61 \Thepwnodelwens 2 i tinkts Macher aemiele pense skate tea. aoe ee ea 363 
3.5.1.2 Asymptotics for maximum likelihood estimators .................-.. 364 
3.5.1.3 The information matrix under a normal distribution ................. 365 
3.5.1.4 Asymptotics for weighted least squares estimators ................+-- 365 
3.5.1.5 Asymptotic optimality of weighted least squares estimators .......... 368 
3.5.1.6 | Asymptotic covariance of the estimator of the structural parameters .. 369 
3.5.2 Comparison of MLE and two-stage LSE in linear functional relations with 
NOnLandomMeUNnoOpservia Dlemuariablesae ee wecie kee cee iertetat teense eet 371 
3.5.3 Comparison of modified MLE and two-stage LSE in linear functional re- 
lations with nonrandom unobservable variables...................--- 373 
3.5.4 Asymptotics under dependent errors in linear functional relations ..... 375 
Sap MNT COME OITA) 5 ooanoagosaddoscdoopcceoopagucuDomeS 375 
3°54: 2 Ch ldentifia bility way eseticts tutte topes kee kt Shh ous p ls TAUREN RL nore gets 377 
3.5.5 Nonlinear models with increasing experimental design................ 378 
3.6 Testing hypotheses in linear functional relations ..................-. 384 
3.6.1 ests onitherdimension ofithersu space mer tee ete tenon ere 385 
3.6.2 Testssunderarciven subspace gecea-wa ed acre eee nay ae een ete 386 
3.6.3 Tests on the existence of a linear functional relation ................. 387 
3.7 Confidence regions in linear functional relations ..................... 388 
seal! Dhexcaseot subspaces Of .codimensionuson Omari eet ee eae 388 
3.7.2 Consistency of confidence mecionSm ries eerie eine ane eee erie 389 
3.7.3 Bivariate dinear-modelsy pcs sensei cis oe ene oe eres ice ree eine 389 
3.8 IN CIMeTI OSs Srey agate toe a yeneeeeee ee eo br ce agentes ceeds cue gona naire ae eee 390 
3.8.1 mearntunctional relationstm piace ernie cite te nicer aie meaereetae 390 
3.8.1.1 Bivariate linear functional relations with nonrandom unobservable varia- 
bles: oc Stace esa ieey ck rene) sleet ts RECS Oeaecinte Seve acer ganar dao nen ay ee reece 390 
3.8.1.2 Bivariate linear functional relations with random unobservable variables 391 
3.8.1.3 Multivariate linear functional relationships with nonrandom unobservable 
VELA DIES Ec trrs chor aBesds fap c yale teievne Gave Slee stares ra aeg he eR ue pce Se PSS een ee 392 
3.8.2 ISI CVM ERNE eel antorebey IS EHHOMNT Sodlacencusdagooncscaucgsdeeobos node: 392 
Sour’) = Polynomial rola tionsuperyr sche cle tomer ncu can ahi oueenas ea coren Cette eee 392 
Soares SING AOI MENON ROC UIE Basen saan couaneccccg oes 168k occcodesd 392 
Biiety-riap a ied MARS P eM eQOF AUB Saber Ee Ain oreo A INN OO ME mere OREO D Gok OO UU Oto G Un Ge oc 393 
3.8.3 General modelsswith<errors-m-varia blesim.. 5 1+ ares ci) eeeieieieie eee kee ee 394 
3.8.3.1 Conditions for an application of the procedures ..................... 394 
DO Osan GAUSS NCW HONE PLOCECULCS sac pt mate yaoi ney eeehy 1 ie tied ee ute ea 395 
3.8.3.0 simplified Gauss-Newtonsprocedures Mice scei-s-. creel achat circles creek 396 
SS; 5:4 a Loditieds Gatiss-NeweOnEpLoceduresim. itis y lain aes ae ieee eels 397 
3tS.3-0) Newton-Raphsonsprocedurespwaasn tae eric ieee reine alate 397 
3.9 Rieferencesikis ass str sdtict hy tere cites ears Ta he eee ee eens 398 
3.9.1 References for errors-in-variables models) <.....)...+1+.0: e070 see. s 398 
3.9.2 Kurther, references: -wks-wytetsass acters ose ere sae TO eRe eae 412 


Contents ily 
A ARE MeN FN a eI rsa iach ok aroha ns Se «sip oy a ne VA EON 415 
Al SE COLO es iy AOS RAGS ale » xoe'olees be ger Fast) 0b be 415 
A2 EA OMT TR GT a/R hein! 2s ole rid», 29"? he A i> gD Dla ee o-eSae aA 417 
A3 DALI SU nist koopa te A DAC RE OO Oe ee 426 
A4 IN OLALIG ECMO LOMIR MOLI Vitae oon ce A fog ar Sk) 6 Feolale HIT oe DS Dies 4 432 
A4.1 OT OCADIAGIIO BNE PALE UN, HAs oe coo Aare roa AMO os io TAG ES at gi 432 
A4.2 VOLO AINA LIICOS ABDACCOE ofoL. (ie. kp oi bes ost eto Ae ie Ae ve e BES Moet 433 
A4.3 REGGIE MEACLI ON Sate ere rae Rone Ghar RET eC ieee aT oie Ops MEO 435 
A4A MEER ALIA OSTA OUINOCOIS . 17) scars snes Cee blr co tielteg ec Katey Boe 436 
A4.5 WISE MAIOUE HIND MICARNTOS f,- /A0 G5 do"F9. Sid PORE OT ATES OI eae ting 437 
A4.6 POUNCE CONC OE re Ahir tl site e ork Hive el ote olete 04 tens Fe Fe wists 438 
A4.7 SS ATEN OE CHIOUS Go Ft INAS oor tie te ened’ sistas + ioia 20a Tiuare Revel me 438 
GUE TGGE? Jt Seles ae AM ea ts - cae in tir aR PRT San 7 Se eRe So. 439 
ISOC ANIAOK | oir 98 ale tos, 2 rote to ley? Ree AA Ss le EM se aiuto Aas a ets Boreas 445 


| 2 Nonlinear Regression 


Chapter 1 


Parameter estimation and testing hypotheses 
in nonlinear models 


While a powerful theory of ‘small’ samples has been worked out for linear 
regression models (compare Bunke and Bunke, 1986), in the nonlinear case 


| effective concepts of statistics fail because of the complicated structure of the 
| parameter space. Geometrical approaches as in the linear estimation theory, 
or the theory of exponential families in the normal model are no longer 
_ available. Heuristically motivated estimators such as the least squares esti- 


mator need iteration procedures. They are not unbiased and their bias and 
variance can only be determined approximately (cf. equations (1.1.9) and 
(1.1.10)). 

In order to avoid these difficulties the statistician will first of all try to 
approximate the nonlinear model by a linear one or to reach a linear model by 


_ an appropriate data transformation (Section 1.1.4). But, when approximating, 


parameter interpretations and typically nonlinear effects, which may just be 
of interest for the scientists, often get lost. With the data transformation, 


_ which is possible in exceptional cases only, the effect on the error structure 
_ has to be taken into account. 


A possibility of the theoretical treatment of the nonlinear regression model 
consists in establishing an asymptotic theory (‘large’ sample size). If the con- 


_ sistency of the parameter estimation has been shown (see Theorem 1.1.1), 


then the supposition is obvious that, for large sample size, the model can be 


_ approximately considered as linear, hence that the properties of this linear 


model are reflected, e.g. in limit distributions of the parameter estimation 


_ (see Theorem 1.1.2). 


If we drop the supposition of the adequacy of the regression model in these 
investigations, the asymptotic results provide important information about 


the robustness of the methods with respect to model errors. At the same time 
_ new starting points offer themselves for a theory of model choice. The analogy 


to the linear model, which results in the structure of the limit distributions of 


the estimation, leads to the following heuristic explanation for the established 
statistical methods. The methods (least squares estimators, tests statistics, 
etc.) are constructed based on the nonlinear structure of the model, but their 
assessment due to (asymptotic) properties of distribution (optimality, quan- 


o¥ 


20 Chapter 1. Parameter estimation and testing hypotheses in nonlinear models 


tiles, etc.) is based on linear approximations. The nonlinear character of a model 
will appear in the distribution properties only if higher order asymptotics is used. 

From this point of view it is not very surprising that there are a lot of ana- 
logies to the linear model when we assess the goodness of the nonlinear least 
squares estimators in virtue of the covariance structure of their limit distri- 
bution (Section 1.1.7). Without any suppositions on the normal distribution 
about the errors we can prove an asymptotic analogy to the Gauss-Markov 
theorem (Theorem 1.1.3). Under normal distribution, stronger optimality 
properties (BAN) can be proved, which also characterize the normal distri- 
bution in the case of an identical error distribution (Theorem 1.1.5). Corres- 
ponding investigations may also be carried out for the maximum likelihood 
estimator under general suppositions on the error distribution (see Theorem 
1.1.7). A similar asymptotic theory as for parameters of the regression function 
may be derived for the residual estimation of the variance (Section 1.1.8). 

For checking the hypothesis, two procedures offer themselves: on the one 
hand, we can, analogously to the linear model, construct tests on the basis of 
the limit distribution of the least squares estimator (cf. equation (1.1.9)). 
On the other hand, we can also verify the asymptotic distribution statements 
(y?) on likelihood quotient test statistics, which are known for the case of 
identically distributed observations, for the nonlinear regression model 
(Theorem 1.1.13). The evaluation of the power of the tests on local alternatives 
is made possible by means of the concept of the contiguous distributions. 

An important model type, which formally belongs to the nonlinear regression 
models, but which plays a special role because of its specific structure, is 
represented by models with changes of state (Section 1.2). A certain regression 
setup only holds for a subset of the observations, otherwise another one does 
so. The point (perhaps date) at which this transition between the different 
states of the system happens is an additional nonlinear parameter of special 
interest. Problems from many regions of Pee lead to such models 
(Section 1.2.1). 

Depending on the transition conditions for the regression functions in the 
transition points, we distinguish models with abrupt (Section 1.2.2) and con- 
tinuous (Section 1.2.3) changes of state. The difficulties resulting for the nu- 
merical calculation of the least squares estimator in such models suggest a 
skilful combination of test and estimation methods for the analysis of tran- 
sition points. This is the reason why the discussion of various tests for check- 
ing the state stability takes a comparatively great space. 

The consistency of the least squares estimator demands a modified consi- 
deration as the parameter space is generally not compact, but depends on the 
sample size, and the regression function does not continuously depend on the 
parameter (Section 1.2.4). 

Summarizing, we can assess that for models with changes of state, methods 
have to be developed in which the numerical implementability has to be 
considered from the very beginning. 


1.1. Parameter estimation in nonlinear models 21 


1.1 Parameter estimation in nonlinear models 


Ne LT Introduction 


Let a relation between x and y be described by a function f: £ — IR!, which 
is called a regression function (cf. Bunke and Bunke, 1986, ch. 1). It is supposed 
that, with a fixed value x of the regressor, the value y of the regressand is 
random and has the expected value f(z). The available information on f is 
expressed by giving a set F = {g5 | 8 € O}, (O CR’) of functions gg: X > R}, 
which is known to contain the ‘true regression function’ f, i.e. there is a % € @ 
‘(true regression parameter’) with f = g»,. If gs is nonlinear in #, then we call 
it a nonlinear regression function. If we have observations y, of the regressand 
on values x; of the regressor, ¢ = 1, ...,, then we get the problems: 


1. Estimation of f or of the values of f on a subset 2, < &. 

2. Estimation of % and of derived parameters (i). 

3. Approximation of f by a function from a given set F¥ of functions that may 
be of a simple structure or may have other advantages. 

4. Confidence regions for %p. 

5. Testing of hypotheses on f or #. 


For the mathematical treatment of these problems we need appropriate suppo- 
sitions on the values x,, y;. 

The values x; of the regressor, which are called design points, are given by a 
vector € = (%,...,X,), the experimental design. Moreover, the equations 


Yt = f(a) + &, t= 1,2,...,” (1) 
hold, where ¢,, ..., ¢, are independent random variables with 
Ee, = 0 ands —._De,='0; « (2) 


The variances o7 are unknown, and we can make any suppositions of the form 
o” := (64, ..., 6,) € &. Later on we will need such suppositions for the whole 
sequence of the o;, ¢ = 1, 2,..., in the section on asymptotic behaviour. 

For the approximation of the expected value we establish a parametric 
function class {95 | ® € O}, where gs : % — IR! is a given function for each fixed 
@ € O. As usual let us suppose that 


[UE IY ane (3) 


then we say that the model is adequate (f = g»,). Later on we will drop this 
assumption sometimes. In the asymptotic considerations we also permit the 
functions gy to depend on n: gf”. 

According to the above suppositions y, are independent random quantities 


the distribution of which is determined by the parameter ¢ := (f, x), where x 


22 Chapter 1. Parameter estimation and testing hypotheses in nonlinear models 


denotes the distribution of ¢ := (é, ..., €n)’. By giving a set XH of distributions 
with x € KH, the preliminary knowledge of x can be characterized. 

Following the notation of Bunke and Bunke (1986), for an arbitrary function 
g: X — RR! we will denote the vector (g(a); es g(&n))’ by g*. With 


Ad, = (Jails ar Galt,)) |e Oe, 
= {Diag [o?, ..., o2] | of € F} 

the set 

[F* X Vg = (We |o = (f,%) © F XK RY (4) 
of possible distributions W, of the random vector which is bounded by (1), 
(2), (3) is an adequate distribution model, which we call a nonlinear model, or, 
in the special case of F* being a linear subspace of the IR", a linear model. 
y © [F* XK Vy is a way of writing that there existsa W, € [F* XK UV] with 
y ~ W;. In the literature also the short form 

Ey « F*, Dy€eV 
or also 

y=r+e, me F®, He = 0, Deev 
is to be found if # is the set of all distributions with (2). 

Surveys on statistical methods in nonlinear regression problems and related 


problems were given, e.g. in Bard (1974), Cox (1977) and in Bunke, Henschke, 
Striiby, and Wisotzki (1977). 


1.1.2 Examples 

For illustrative purposes we first of all give some typical examples of regression 
models. 

(a) Linear regression model 

If the regression function g»(~) = g(a) is linear in #, then (1) and (2) are called 
a linear regression model (cf. Bunke and Bunke, 1986). 

(b) Empirical growth curves 


Empirical growth curves occur in biological or chemical problems where a 
quantity u, at time ¢ is observed. If we suppose that the speed of growth 
du,/dé is proportional to the quantity just achieved and to the difference 
between a maximally achievable quantity 6 and u,, then the differential equa- 


1.1. Parameter estimation in nonlinear models 3} 
tion 
= au,(b — Ut) 


holds. Integration, taking the logarithm, reparametrization, and assumption 
of an additive error structure lead to the model 


yi =o —In(1 +e") + «; 
= go(t) a3 Et» oe = (x, A, B). (5) 


This model was investigated by various authors, e.g. Nelder (1961, 1962). 


(c) Exponential models 


For many applications, mixtures of exponential functions are appropriate: 


P B= (Oo, Xs +++» Hps Bis +++» Bp) 

Go(X) = Xm + Di ast . 
s=1 c= (1), OO) Xp) » 

ai) € IR’. 


For special cases specific methods of estimation have been developed (e.g. 
McGilchrist, 1968; Rasch, 1967; Agha, 1971; Saleh and Choudry, 1975). Models 
of the form (6) are also investigated under the name of ‘multicompartment 
systems’. 


(d) Cobb-Douglas models 


In economic problems, e.g. in the investigation of production or demand func- 
tions, regression functions of the following type occur: 


Gea) = ot OS; P= Ak. Biss.) - 


Ce (ta), on %(p)) 
Such regression functions are called Cobb-Douglas functions. Sometimes a 
multiplicative model is assumed in order to assure an additive error structure 
again after taking the logarithm: 


Yi = Jo(X) & > Es, = 1, Dee. 


A detailed discussion is given, e.g. in Goldberger (1968) and Goldfeld and Quandt 
(1972). 


24 Chapter 1. Parameter estimation and testing hypotheses in nonlinear models 


1.1.3 Least squares estimation 


Estimating linear parameters y = C8, a restriction to linear estimations 
4 = Ly makes sense in linear regression models. This is not true for nonlinear 
models since the domain of values of y is not a linear space in general. Suppose 
we want to estimate y = g4 := (95(21), BAS Go(an))’, then it seems to be rea- 
sonable to use estimation functions 7 with values in F* := {g', | 8 € O}. Such 
estimation functions are, e.g. projections gi. » of y on F*, where we can suppose 
e.g. a norm of the form 


m 1/2 
"ly\n = jot S woae » (w>0;¢=1,...,2) 
=1 
such that 
“ly — Foanla = ne ie Foln ae) 
holds. 

By choosing the weights w = {w;,¢ = 1, 2,...} differently, we obtain dif- 
ferent estimators. As we allowed in the heteroscedastic case, it suggests itself 
to take the inverse variances as weightings. If these are not known, we can 
insert estimators instead and thus construct two-stage estimators. This is 
also the reason why we allow the weights w, to be positive random quantities, 


which may depend on the sample size n, too, in the asymptotic investigations 
(oo%”), 


Definition 1.1.1 Let a 3) € O with f = gp», exist. An estimator } of y(A) ts 
called a weighted least squares estimator (WLSE) if >(y) = y(d(y)) and if the 
estimator & is a solution of (8). With w,= 1 (t =1,..., n) (w = wy), > is called 
an ordinary least squares estimator (OLSE). With w,= 0,7 (¢ =1,...,n) 
(w = w,), p ts called a generalized least squares estimator (GLSE). 


If we speak of least squares estimators (LSE) or their generalizations in the 
following, we always assume that there exists a solution d(y) of (8) and that 
it can be represented as a measurable function in y. 

If gy has for example a representation 


go(t) = a'hg(x) (8 = (x, 8)€ O=R°XF, aE L) 


with hg: X — R?, where # is a compact subset of a Euclidean space, and if 
h,(x) is continuous in f for each fixed x € #, then the existence of a measurable 
d(y) can easily be proved. 

In case ¢, ~ N(0, 07), = 1,..., 2 is valid, the GLSE obviously coincides 
with the maximum likelihood estimation function (MLE). In general, a WLSE 
can not be explicitly computed, but iterative numerical methods have to be 
applied. The WLSE is in general not unbiased and an exact computation of 
the bias is not possible. 


1.1. Parameter estimation in nonlinear models 25 
- = 


Box (1971) derived a rough approximate formula for the bias of the OLSE as 


&, ~ N(0, 1) (CS Ales. 2) (9) 
by approximating y —g3 by a ore expansion of second order by #% and 
by taking & — 9 w Ae + (e’Bye,..., &'B .€)’. This yields: 

x o2 n -1 7 n nary 
E(a — H) & ary |S | > F; tr {| Ss} Fn a} (10) 
a j=1 i=1 I=1 
ee ee aide atin Lape Si 


F, = Fi) = eee 


88: \a2%, i, (00, 
A, = a ge(e) a) 
(8, CP 


Wes unre 
o=0 SIRS 


There are various reasons to drop the supposition (3): f € F. On the one hand 
it simply can not be known whether the model with ¥ is adequate, hence 
whether (3) holds. Regression models are often applied although there is 
little knowledge about the true structure of the dependences. The problem of 
the model choice is complicated in the nonlinear case, and it is not advisable 
to assume adequacy from the very beginning. Furthermore, the proper problem 
may consist in approximating the regression function in a given functions class 
F. If we do not aim at the adequacy of the model, we may, perhaps simplify 
the optimization problem (8) by an appropriate choice of J. 

In order to find the connection with the later asymptotic considerations we 
still want to generalize (8). Let F, = {g{" | @ € O} be given function classes 
(% — IR!), and w = {w}")|¢=1,...,”} sequences of positive random va- 
riables. Let 


and 


9€0 (11) 
Q,(8) -= “ly — [99 F ln 


Definition 1.1.2 f,, := Giy is called the weighted inadequate least squares 
approximation (WILSA) of f if & fulfils (11). The respective parameter esti- 
mator & we call WILSE. 


Apart from the heuristic sense of the optimization problems (8) and (11) we 
relate the idea with them that, with a growing number of observations, WILSE 
» converges against y and that WILSA better approximates the projection of 
fj on F,, in the sense of the seminorm |f| := “|f*|,. These problems will be in- 
vestigated in Section 1.1.5. 

The proof of the good approximation properties of the WILSA is of funda- 
mental importance for nonlinear regression models. The assumption of a 
certain function class F almost always represents an approximation in appli- 


26 Chapter 1. Parameter estimation and testing hypotheses in nonlinear models 


cations, i.e. the assumption of the adequacy of the model (‘there is a &% with 
go, =f?) is violated and at most approximately guaranteed (‘there is a 
with gs, ~ f?). Good properties of the WILSA correspond to the stability 
properties of the least squares method in such inaccuracies of the model. 

If some of the components of # linearly enter into gy, 1.e. if 


go(x) = a'hg(a) (8 = (a, f)€ O= R°XB, xe ZL) (12) 


with hg: X > IR?, then the dimension of the nonlinear optimization problem 
(8) may be reduced (see also Lawton and Sylvestre, 1971; Barham and Drane, 
1972). With 


P, = H,H?, H3 = (H,D Hs) B,D, 
Hy = ((havled) 
(hei) denotes the ith component of hs) 
D,, = Diag [w,, «.-, @,] 
(8) transforms to 
"ly — Oslin = min *|y — Peyla = “ly — Paw ln> 
peB (13) 
b= (4,8), &@= Hey. 
This fact is illustrated by the following simple example. 


Example 1.1.1 Let 


fo(x) = a4 + xe** (3 = (a1, Xe, B)). 
With w, = 1, (13) is given by 


min 3 Ey, — 4u(8) — 84(8) 


&(B) = n} [> Yr — (8) orl 


t 


&(B) = [> et — yt (x ae [> yor — n-1 ¥) ef >) u|: 
t t t t 


1.1.4 Linearization 


In this section we assume the adequacy of the model, hence the validity of (3). 
As the nonlinearities cause profound difficulties for the treatment of the model, 
an approach that suggests itself is to try a linearization. There are various 
possibilities to linearize the model (1): 


(a) Embedding into a larger linear model [% X V] with F§ < /. 


1.1. Parameter estimation in nonlinear models 27 
co I SE ns i chai Tne an al 
(b) Approximation by a smaller linear model [f < V]y with £ < Fé. 

(c) Approximation by a linear model without the assumptions (a) or (b). 
(d) Linearization by transformation. 


(a2) Mostly the embedding into a larger linear model has the disadvantage 
that the used best unbiased linear estimator of 4 — Ey in the linear model 
[f X V]g are not efficient in the original model. Often F‘ < Ff is only fulfilled 
for £ = IR" and the problem becomes uninteresting. Moreover, the estimators 
may have values in £ — ¥* so that the interpretation of such estimation 
values is difficult. But this procedure is sometimes a useful first step in stati- 
stical analysis. For instance, this is the case if, with fixed x-values, there have 
been repeated observations. In case we have a normal model (normal distribu- 
tion) and if Ps is the projector on F in the sense of the norm “”+|-|,, then the 
total ‘information’ from y on uw and o is contained in a certain sense in z(y) 
i= [P”sy, y(I, — P”*) y]. 2(-) are sufficient statistics (see Bunke and Bunke, 
1986), Theorem 2.1.4). In this case P”sy is sufficient for y. 


Example 1.1.2 We consider a regression model with repeated observations, 
i.e. the first n, design points x, equal x, the next n, equal x®), etc., so that 
there are n, observations each to 2 (12 = 1,...,m). With 


Ua Diag (ios ta 
% = (go(a), isles go(x™)), 
F§ = {Ugh |9¢O} and FIC L =A), 


then we have for o; = o: 


Poy = (g™,..., 9) (14) 
n 
(i.e. the vector of the mean values 7 = nt »° x. of the observations be- 
longing to the design point «”) and eet 
GBs) Ye (Yes) )° (15) 
4=1 e=1 


(b) and (c): The approximation by a smaller (# < F*) linear model [£ XK V]x¢ 
means that we restrict ourselves to estimate functions fi: IR" > £ with values 
in £ when estimating . This seems to be reasonable provided that f lies ‘in the 
neighbourhood’ of £. The bias of such estimation functions 


Bf) = ||Ei(y) — fila, A € Mz 
becomes smallest iff 


|Zay) — P2f'lla = 0 (16) 


28 Chapter 1. Parameter estimation and testing hypotheses in nonlinear models 


holds, where P4u denotes a certain element of £ with min ||u — 2||4 = ||“ — P4ul| 
zeL 


(projection in the sense of the seminorm ||-||,) for aes u € IR". In the class of 
estimation functions that satisfy the ‘generalized unbiasedness’ (16) or con- 
ditions of the minimal bias for all admissible parameters, the estimation func- 
tion fi can be determined with minimal risk E||@(y) — f*||?, in the normal model. 
eee Cee to Bunke and Bunke (1986, theorem 2.7.2), it is given by f(y) 

= P4Pey. 

We are led to similar problems if we want to approximate the function f by 
a function f = a«’h from a linear function space 


Fy={aehi «ne R$ cCF,h: X —>R’, 


where the goodness of the approximation is characterized by a seminorm ||-|| 
in F (i.e. by ||f — g]|). In this case 


£= KA), H = ((hi(o)) 07% 


These problems were treated in detail in Bunke and Bunke, 1986, ch. 2), and 
Hoffmann (1977). There the demand for a minimal risk £||f — &(y)’ h||? among 
all f = &’h with 


Ef — Ps,f\| =0 (17) 


(Ps,f: projection of f onto Fo), as well as the minimization of the unfavourable 
risk (respectively € € J X #H) (minimax approximation) were discussed. 

Nonlinear (in #) regression functions (g5: = IR™-—> R!) are often appro- 
ximated by Taylor’s series of yth order at a point 2%: 


ra 4) di H 
go(x) = go(%o) + DX A le — 2%) ack | Jo(x) 
j=1 v |r 2, 
1 d t 
Sf — ip) Jo(X) 
y! da |x 
od pa : Dy Co(21, OEY) ten) (xy = oy)" ORD (a, se tom)” « 
G=0 tite tin=j 


Thus, we approximate gy by a polynomial in x. If we extend the domain of 
values {C9(t1, ---, 4m) | ® € O} in each case to IR!, then we look for an approx- 
imation in the linear space of such polynomials: conclusions from the new 
parameter vector cy to the old parameter # are not immediately possible after 
such extensions of the model. 

(d) By transformation many nonlinear regression functions, which often 
arise in applications, may be brought into a functional form which is linear 
in the parameters after the parametrization (see for instance Draper and 


1.1. Parameter estimation in nonlinear models 29 


Smith, 1966). If we consider e.g. the exponential model with go(x) = ae®®, 
then we obtain the linear function In f = a + fx with a = In «. For the Coob- 
Douglas function 


Ga(x) = oat ... af 


A 

we get In g(x) = a + ¥) B; In a,). In certain physical problems the function 
i=1 

«/(1 + Bx) occurs, which can be transformed into a linear form by 1/gs. Hence, 

generalizing these examples we suppose that we have a model 


¥:=fa)+e, t=1,...,0 (18) 


with He = 0 and De = oI, f € {gs | 8 € O}, and that there exists a real func- 
tion 7 with 


T99(x) = «'(B) h(x), 


where h is a known k-dimensional vector function. 
It is a common procedure in practice to transform formally the model (18) 
into 


Ty, = T(x) + nN, Pm Aj. 5 (19) 


and to suppose for 4 similar structures as for €, which is of course not generally 
justified. In general the violation of the distribution assumptions for the 
‘error’ 4 leads to the LSE, which are formally computed in the ‘wrong’ linear 
model (19), being neither consistent nor leaving other good statistical pro- 
perties (cf. e.g. Goldberger, 1968). Nevertheless this method can be justified as 
an approximation if only o is sufficiently small. This may be achieved for 
instance by a suitable design planning (choice of the 2) and a ‘model concen- 
tration’ carried out before the transformation. Let us briefly explain this pro- 
cedure, which is described in Bunke (1976, 1977). 

Suppose we have an experimental design with a spectrum = (a, ..., a™) 
and denote the number of observations at the point «% by n;. Then we can 
describe (18) in the following way: 


y = Ugs, + e€. (20) 


Now it is suitable not to change over from the model (18) to the model (19) 

with the transformation 7’, but first to form mean values about the observa- 

tions to the fixed design points in order to achieve possibly small variances. 
ay, ay 


Let@=|: |, G=1,,a/n,; for a=([: |¢R*",a,¢R”. 


Zn Om 


30 Chapter 1. Parameter estimation and testing hypotheses in nonlinear models 


By taking the means or by ‘model concentration’, we obtain from (20) that 
9 = %, + &. (21) 


If we apply the transformation 7’ to (20), then we get 


t = Ty = Hod) + Tye, + €) — 795, (22) 
with 
hie) 
i= : 
h' (Zm) 


Let 7 be twice-continuously differentiable. Then it holds that 
t = Ho(9) + Qo€ + Wee, 
where Q» and W%, respectively, are diagonal matrices with diagonal elements 


dT(z) 
dz 


d?7'(z) 
dz? 


? 


2=ggla;)+ 55,5 


? 
2=99(t;) 


the 6, are determined values between 0 and &;, é? = (G5) sorte): li bes 


rank k < m, then the estimator 
= Ab (Hi) (23) 


is obviously consistent because of & “+ Oifn;—>oco (i = 1,7:.,m). 
The covariance structure of the approximate model 
t = Ho(d) + Q,€ (24) 
with 
DOs b= a2) 4. 00 Diag [hie cons ee oe 
suggests applying the formal BLUE 4, in (24) by using an ‘initial estimator’ &>: 


Opie (HT a ge DS pt. 


1.1.5 Consistency and asymptotic distribution 
of the least squares estimation 


1.1.5.1 Introduction 


In this section we will give conditions for the consistency and the asymptotic 
normality of the LSE. The first papers devoted to this problem were those by 
Jennrich (1969) and Malinvaud (1970). The difference in the two approaches 
consists in the fact that under the assumption of the compactness of the 


1.1. Parameter estimation in nonlinear models 31 


parameter space the strong consistency is shown in the first work whereas 
in the second work the weak consistency is obtained without any assumptions 
of compactness. In this chapter we follow a representation by Bunke and 
Schmidt (1980), where the results by Jennrich are generalized. The assumption 
of compactness is weakened in the sense that parameters that linearly enter 
the regression function may vary in the whole Euclidean space. Furthermore, 
nonidentically distributed errors, especially heteroscedasticity, and the in- 
adequacy of the model are allowed. By omitting the inadequacy of the model 
we also include the combination of regression and approximation problems, 
which was introduced in Bunke and Bunke (1986) by the name ‘approgression’. 

The class of admissible estimation functions is extended by introducing the 
weighted sums of squares. In this section we furthermore present the results 
by Zwanzig (1980), which contain the consistency and asymptotic normality 
of the parameter estimation in the inadequate model. 

Multivariate generalizations, which we do not consider here, are given for 
‘instance by Malinvaud (1970), Barnett (1976), and Fedorov (1977). In Barnett 
(1976) conditions are formulated under which the multivariate LSE weighted 
with an estimated covariance matrix is equivalent with a consistent local 
maximum of the likelihood function. We do not take into account the case 
that the sequence of errors is generated by a stationary time series, which 
suggests a different treatment (frequency domain). See, for example, Hannan 
(1971). 


1.1.5.2 The model and assumptions 


Again, we assume the model 

y, = flu) + &, pi AOA pennadltedcn (25) 
with an unknown regression function 

{:% —R', Fee: 


We discuss the asymptotic behaviour of the approgression estimation: in our 
formulation from Section 1.1.3 this means the behaviour of the weighted 
inadequate least squares approximation (WILSA) q5. and the weighted in- 
adequate least squares estimator },, (WILSE), as well as of the corresponding 
estimation under the supposition of adequacy ‘f = g»,’. Let us first of all 
introduce some notation: 


For 1, k: X — R! we define 


(Z, K)n oa wl, k)n = 0" »2 w,” U(2;) k(x;) 
t=1 
and 


ml, = PL Dn: 


32 Chapter 1. Parameter estimation and testing hypotheses in nonlinear models 


where 


aD == tw 1, my Sy os} 
is a double sequence of random variables about which we assume that 


Sq t= max |w}") — u;| "+0, where u =: u(w) = {w,t = 1, 2,...} 
1StSn ; 
is a sequence of positive numbers with O< xl u,Se@<oo, t= 1,2,.... 
Correspondingly we use the notation 


“ly — U3, = 0? YD wl(y, —Ua,))?, yeR* 
t=1 
and 


(1, b)n = ((°Ulis bya) ay? 
for’: XY + R?,k: Xf > R?. 
“UE B= (Ct Hap eee: 


LOE TE sed Re > Wy so 
The WILSA 95. introduced in Definition 1.1.2 is the solution of the minimi- 
zation problem min Q,(#) with 

: 80 


Q,(8) := Q,°(8) = “ly — g3"ln- 


(Where no confusion is possible or where the dependence shall not be specially 
emphasized, we leave out the labelling with the series of weighting w. Con- 
versely, in this as well as in the following chapters, we point out the depen- 
dence of the estimator & on the sample size by writing 5:,.) 

First we formulate some assumptions: 


A, &,t = 1,2, ...are independent random variables with He, = 0, Hs? := oF, 
and we have 
(a) &, ¢ = 1, 2,... are identically distributed with o, = o or 
(b) the €, satisfy a modified Lindeberg condition: 


o,2y>O0 forall ¢ and sup ih x? dF',(x) ——» 0, 


cco 
t izi=e 


where F, denotes the distribution function of ¢,. Furthermore, let 


Oh SO CDs Sal PAs goa 


co n 
> & 2H} < 00 and == 2! )) w07 > t, > 0 


t=1 t=1 Nn—>0Oo 


hold. 


A; 


Ag 


1.1. Parameter estimation in nonlinear models 33 


For the given sequence of functions g”: ¥ x @ > R! we use the notation 


gs := 9(-, 8). We suppose the functions to have the following repre- 
sentation: 


Ge By oh (x,8) (kh: xB R*) 
with 
Ua fk) 6 O == IRE Se (# a compact subset of the IR”). 


There is a function hg: X X #@—IR?, which is continuous in f for any 
fixed « € 2, so that with hy” := h(a, B) 


sup “hy” — he|, >0 
Be 


holds. 
(a) Let # denote the set of functions 


{f, heiy| 4 = 1,...,p; B € B} (where hg;, is the ith component of hs), 
then there exists a real number “(J, &) for all 1, k € H, such that 


sup (2, k)n +", (1, k)| > 0 
Lked 


holds. 
(b) For all 6 € @& let “(hg, hg) be a nonsingular matrix. 
Let a unique solution 3 of 


min *|f — go? = *|f — gos|? 
Prac) 


exist. 
There is a % € O for which f = g»,. Here we have 


Gg = ah;, 0 = (x; B): 
For all 3, 3 € O we have “|gy — g3| = 0 iff = 9%. 


The assumption A, includes the asymptotic identifiability of the parameter 
1, to which the best approximation gf of the function f in the sense of the norm 
«|.! corresponds. With A; the adequacy of the regression model is ensured and 
A, provides the asymptotic identifiability of the ‘true’ parameter 09: 


1.1.5.3 Consistency 


The following theorem supplies the consistency of the WILSA, WILSE and 
WLSE. 


3 Nonlinear Regression 


34 Chapter 1. Parameter estimation and testing hypotheses in nonlinear models 


Theorem 1.1.1 
1. With A,(a) or (6), Ay and Ag, it holds almost surely that 


£ Ww (n) 
lim If —~ I3 In 
Noo 


= lim “|f — 95,| = 4, (26) 
n—co 
where A, := min “|f — go| (consistency of the WILSA) 
9€0 


2. With A, (a) or (b), Az, and A,, we have 

o, => of (consistency of the WILSE) 
3. With A, (a) or (b), Ao, As, and As, it holds that 

A; = O and 0.3) (consistency of the variance estimator) 
4. With A, (a) or (b), Az, Az, As, and Ag, we have 

d,——> 9% (consistency of the WLSE) 


We prepare the proof of the theorem by means of serveral lemmas. 


Lemma 1.1.1 A, yields 


sup |"(J, k), — *(1, k)| + 0 
Lkes 


Proof. It holds that 
sup (2, k)n so “(1, k)| 


Lked 
< sup |"(1, k), — “(l, &)_| + sup |"(1, bn — “(l, k)| + 0 
Lkex Lked 
because of 
Sp 
"°(2, k)n <= “(, k)»| = “WU “Wkly (27) 
x 
and 
w Sn as. 
sup | (J, k)n i “(L, k),,| = sup “\, “kl, Ta 0, 
Lke de Lke Fe Ma 


since sup “|/|,, > sup *|/| < co follows from A;. 
led led 


Lemma 1.1.2 Let Ay, As, A, be fulfilled. Then 


sup “(1, #), ——> 0. 
le KH 


1.1. Parameter estimation in nonlinear models 35 


Proof. First we show 


sup “(I, €), ——> 0. (28) 
led 
Because of A, the conditions of the strong law of large numbers is satisfied 
for “(1, &)n. Hence %(1, €), ~=+ 0. 
For the proof of the uniform convergence (28) it is obviously sufficient, by the 
definition of # for fixed 7, to show that 


a.s. 
sup “(heiys E)n Sree le 
B 


Because of A; and the continuity of hg; there exists, for each 7 > Oand Bf € &, 
a neighbourhood Us of 6 and an n(n, 8) with “|hgaiy — hg iy|n < 7/2 for all 
B€ Uandn > n(n, B). Because of the compactness of @ there is a finite cover 
U = (Uz, } of # and an n(7) such that 


sup “hg iy — hiviyln <0 
BbeUg 


holds for n > n(y) and Uz, € U@. Because of A, we have “|¢|%, =*+ 7,. Now 


we consider a realization with 
le|2 > ty and “(heiys €)n > 0. 
“(Wp iys €)n S “Way — Mp ncinin “lEln + "(hg &)al Be Us, 
there is an n(n) with |"(hgiy, €)nl <7 for all n > n(n) and B € #. Thus (28) is 
shown. 
We use the representation 


“(L, &)n = “(L, &)n — Sn- (29) 


With this we have 
n (n)]\2 
g, = (= Dd Uil(xz) &¢ i ae } 

eal 


2 n 
< + 5 & Eel). (30) 
ua NT t=1 


i pam s. 5 
In the case of A, (a) it follows that — ¥ & =*+ «2. In the case of A, (b) it 
n nm t=1 


follows that ea >) (e? — 0?) + 0 according to Kolmogorow’s strong law of 
nN t=1 


3* 


36 Chapter 1. Parameter estimation and testing hypotheses in nonlinear models 


large numbers, because 


ta | a 
Dg Le = Gh) NG, eo 
t=1 e t=1 t 
Because of s? “+ 0 and ue “|, < co we obtain sup &2 ==> 0 from (30). To- 


led 
gether with (28) and (29) this yields the assertion. 


Because of A;, |f — «’hg| is defined for all « € IR? and B € #. Consequently 


min “|f — o/hg)® = *|f — oghal? =: d(B) 

acl? 
with 

op = “(hp, he) * “(hp f)- (31) 
ag and d(f) are continuous in f because of A, and Aj: In d, = (4, B,) the 
structure of @, is given by 


bn ad “(hp hg) a (a y),- 
Furthermore, let 


dp =H, WE HOLY, wa. 


Lemma 1.1.3 Let A,, Ay, and Ag be fulfilled. Then 


di, — xp, + 0 (32) 
holds and 

sup ||&; — «,|| —> 0. - (33) 

BEB 


Proof. From Ay, it follows that oD 4“IAs| > sup “|ha | and as in the proof of 
Lemma 1.1.1 we show: BEB 


(n) p(m)\ as. 
(rg gn) > Mp, hs) 
and 
(n) (n) a.s, 
(hg é), = “(hg oe he.» e), “1 (hg, &) 7.0 
by exploiting Lemma 1.1.2. 
With 


"(hp » hg = %( w (hg. f) fn aie, 0 


(32) follows. 


1.1. Parameter estimation in nonlinear models 37 


As in the proof of Lemma 1.1.1 we then get 


sup "(hi hi”), — “(hp, hz) > 0 
BEB 
and 


sup (hs), fn — “(he, f) > 0. 
BEB 


A, and Lemma 1.1.2 yield 


sup “(h5”), £), ——> 0, 
peB 


which yields (33): 


sup °(hs", hy?) (AY, Yn — “(legs hp)? *(hep, f) 
beB 


= sup ||@s — x,|| —> 0. 
Be® 


Proof of Theorem 1.1.1. Because of the continuity of d(f) there exist a Bf ¢ B 
with d(6!) = min d(f). According to this, 
pe® 
Ay = "lf — Goel with Oo (apr, BS). 


On account of sup ||«s|| < co and Lemma 1.1.2, we obtain 
Be 


“(ap lia en = 94, (hp,> En “> 0. 
With Lemma 1.1.3 this yields 

“(tf — 98,2 &)n = “(t &)n — [on — BY (hg, En) 

— ap (hp, &)n > 0. (34) 

From A, it follows that 

TO a 
From Lemma 1.1.1 it follows that 

Disses ls i 98.) +.0 (36) 
(34), (35) and (36) yield 

“ly — 99,12 — “If — 92,1? — t» —+ 0 (37) 


Because of ||sup x,|| <oo and Lemma 1.1.3 it almost surely holds that 
lim sup ||&,|| < co and with A, it follows that 


(n) 


w (n)/2 
Oe a KE, 


n 


= &,"(hy, — hg, hy, — hg. )y Gn => 0. (38) 


38 Chapter 1. Parameter estimation and testing hypotheses in nonlinear models 


From (37) and (38) we obtain 
O.(9.) — “If — 98,2 — tw = “ly — 98,/5 — “If — 96,1? — to 


+ “Igg. . + "(y — 93,98, — 94.) “= 0. (39) 


ww 


Analogously we can show that 
(84) — “If — gorl® — ty + 0. (40) 


Together with 0,(d,) = O,(0) a.s. (36), (39), and (40) yield assertion 1. 

If A; is additionally fulfilled, then 4; = 0 holds and assertion 3 is a direct 
conclusion from (39) and (26). 
Now we prove assertion 2. Because of Lemma 1.1.3 it suffices to show that 


A 


ee a.s. pr. 
We put 


d(B) := “|f — aghel? 
and 
d,(B) := “ly — ogheln — 
Analogous to the proof of (37) we can show, by applying Ag, that 


sup |d,(8) — d(B)| + 0 (41) 

BeB 
Let 8 be a limit point of the sequence B,. Then, because of the compactness of 
#, there exists a subsequence Bn, converging to B. Because of the continuity 
of d(6) and of the uniform convergence of d,(f) to d(f), it follows that d,, ( Bn,) 
=*+ d(B). As Bn, is WILSE, dn, ( Bn,) ) <d,,(B), hence d(8) < d(6/). On account of 
A,, d(8) has a unique minimum at f = $f. Thus it almost surely holds that 
8 = Pf. The proof of assertion 4 follows under A; and A, immediately from 
assertion 2. 


11.5.4 Further assumptions 


Now we will investigate the limit distribution. For doing this we need some 
further assumptions. 


A, hg and h have continuous derivatives of first and second order with 
respect to 6. Let A, be true for these derivatives, too, and let A, hold for 
the class 


Ohpiy Oh ai 
Hy i= i) lea iin eee eT — 


ial SAE 1h Je caval Ge {TS Ta he Os 
OB, ” 2B,0B | ‘ 


{ 
5 . . . . 
1.1. Parameter estimation in nonlinear models 


Moreover, let ® 
Yn sup *[haiy — WM"), > 0, 
Be® 
"\Ohaiy heey 
op op 
Ag (a) Foralli,j7=1,....p+ mand ky :=k:= = 
B=? (hy, «2-3 Kom) 


we have 


n 


¢y = linnn-* s opurk;(x,) kj(x;), 
n—>0o t= a 


and the matrix C = C(u) = ((c)) is regular. 
(b) # is an inner point of 0. 
(c) It holds that 


ke(ary) €¢(U, — Ww”) =+0. 


| 
Ths 


Remark 1.1.1 With A; (f = g»,) and A, it holds that #f = 


39 


8) so that, in the 


adequate case, A, is equivalent to the corresponding condition formulated 


with Do. 
Ay (a) The matrix 


yes lee saline 


is regular. 
(b) It almost surely holds that 


Nee m+p 
8=05] /7=1,...m+D 


ay Yn “(kor t — Jorn < 00: 


1.1.5.5 Asymptotic distributions 


In the following theorem we derive the asymptotic distributions of WLSE 


and WILSE. 


Theorem 1.1.2 Let A, (a) or (b), Az, and Ag be satisfied. 
“(k, k) be regular. With 


1. Let A;, Ag, A, and Ag be fulfilled and let B-4(u) := 
the notation 


M(u) = B(u) O(u) Blu) 


40 Chapter 1. Parameter estimation and testing hypotheses in nonlinear models 
it holds that " 


LL n (Sn — Oo) > (0, M(u)) 
(limit distribution of the WLSE). 
2. If Ay, Az, Ag, and Ay are fulfilled, then 
Ln (Sy — Hf — 2G(u)}+“(k, f — Gorn) 
—> (0, 4[G(u)}4 O(u) [G(w) 2) 
(limit distribution of the WILSE). 


In order to prove Theorem 1.1.2 we need the following lemma: 


Lemma 1.1.4 Let k be a vector function defined on X —IR?*™. Under the 


n 
assumptions A, and Ag (c) and provided that the limit cj; := lim n=! Y ofuzk;(2;) 
xX k,(x,) exists and C := ((cis)) > 0, it holds that n>co 8 t= 1 


£{n “(k, &),+ > N(O, C). 
Proof. Under A, it holds, for 
iia Vn 4k &), 


that 
ZA,} > N(0,C); 


because of the central limit theorem in the formulation of Hicker (1966) (com- 
pare also Bunke and Bunke, 1986, [A 4.20]). On account of Ag (c) it further 
holds that 


k(x1) & 
As a result of the representation 

%, i= Yn “(ky en = ha + Sn 
we thus obtain 


£{z,} > N(O, C). 


(w\” — Ut) aan 0: 


I 
Ms 


Sn 


Proof of Theorem 1.1.2. 
(a) Proof of assertion 1: We proceed from the derivation 


dQ.) oe 


zy (eS 98” — dn = 2°(KS?, gS — 99) )n 


— 2° (KS, €)n + 2°(kS”, 9) — Go,)n 


1.1. Parameter estimation in nonlinear models 41 


and use a Taylor expansion of second order of gs at & = J. With 


1 dn € Gint 
Pr = n : = 
0 Do, q ow On = Vn (3, fe Bo) 


we have | 
Vn 4Q,(8) 
Ney we ey ars = nl Ldn n— Sn ni 
eae ee PalLidn + Ty — Sy + ty] = 0 (42) 
with 
w Ps 2 (n) 
Le iiks ees oe 
(xs Bo is og ae Be 


a Vn “(kg Is — 9 ay 


§, = Yn *(ho,, €)ns 
f= Vn “(ko, — kh. , €),.- 
Similarly to the proof of Lemma 1.1.1, it follows from A, and A, that 
L, + *(ko,, ko,) (43) 


From A, and #;, ~=+ 9%, we furthermore obtain that 


lal S lI(ko,, kg all Vo “lg — gol, => 0, (44) 
hp, — he, 
kg. — ko, =|. oni” , Ohe ee, 
a, Fee aaa 
OB |p=p OB |p—p, 


and from this 


a.s. 0, (45) 


Yn “(kg. — ko, ke. — ko,)n 


l€n| S Ela 


because of A,. 
Finally, from Ag (a) and Lemma 1.1.4, it follows that 


£{§,} > N(0, C(w)). (46) 
On account of A, (b) and Theorem 1.1.1 we have 
Pnr——> 1. (47) 


Then assertion 1 of the theorem results from (42)—(47). 


42 Chapter 1. Parameter estimation and testing hypotheses in nonlinear models 


Before we focus our interest on the proof of 2, we want to formulate one 


more lemma. 
By Q/,(3) and Q%,(#) we denote the vectors of the first partial derivation with 
respect to @ of “ly — g¥|? and “\y — gs|2, respectively, and correspondingly, 


by Q7(9) and Q’’(9) the matrices of the second derivation of *|y — g$”|? and 
Sl-Gal*. 


Lemma 1.1.5 Under the assumptions in 2 of Theorem 1.1.2 we have 


(a) Vn |102(9) — 67(9)|| “5+ 0 for all 9 € OM, 


(b) ||O% (F,) — Q''(9!)|| + 0 for all sequences {F,} for which |\%, — 9| 
< |S, — || holds almost surely. 


Proof. Because of 


0,(3) = "(ks ke 2Y — $n 
and 
O(9) = —2 (ko, y — 9o)n 


and A,, the assertion (a) follows similarly as in the proof of Lemma 1.1.1: 
= sin Qn(8 B\ Sly — 99 ln Vr It(kS? — ko, kG? — ho)all 


+ |["*(ke, ke)all Vn “lgS” — gola ——> 0, 


as we can show 
0,(9) = “ly — 912 > “If — gol? + ty < 00 


analogous to the proof of (39). 


. a?g™ 09 
Furthermore, with KY) — , Ko = — ,, it holds that 
Op og 
n(O) = 2°(k5, hE), — 2K, y — gh) (48) 
and 
Q"'(F) = 2%(ko, ke) — 2K, f — go). (49) 


We consider the estimation 


On 


(n) kz. )y 


OKT.) — Q'S 2|"(k52, Bsn — “Cor, bor) 

+ 2l|*(eor, bor)n — “(hors kor ll + 2"(Ke, ~ Kors t — Gorn 
| 2|""(K3., gor —95.)n 
+ 2| "(Kz 8), 


+ 2ll"(Ko57, f — Jor)n — “(Kor f— gos) 


| 


1.1. Parameter estimation in nonlinear models VAS 


On account of A; and A, the second and the fifth summand on the right-hand 
side tend to zero almost surely. 
According to Theorem 1.1.1, 9, “+ 9, and with A, it follows that 


(n) 
(ts, — kor, ke. — keor)a|| 22> 0 : 
and 
Zee seap a egy) Cat as. 9 (50) 
5 la oe a Seo rate > 
i,j 08 ; 09; o=3,, oo, 08; o=9!|n ; 


so that the first, second and third summand tend to zero almost surely. 
Because of (50), “|€|, ——> t, and Lemma 1.1.2 (for # = H ,), we have for 
the last summand 


(Kz, ),|] < const Z,"°|eln + IP°(Kors €)all 22> 0. 


(b) Proof of assertion 2 of Theorem 1.1.2. 
Now we procceed from a Taylor expansion of first order of Of, at # = W: 


0 = gnQi (Fn) = GnQi(02) + OF (Fn) (I, — OF). (51) 


According to Lemma 1.1.5, 


t = Vn [O%,(0!) — O'(9)] + 0- (52) 
and 
QO" (3:,) “+ Q(t) = Gu) (53) 


With V,, = “(kgr, f — Jor)n We obtain from Lemma 1.1.4 and (52) that 

LLY n [O4(9!) + 2V,]} = L{—]n 2 (legs, &)n + tr} > N(0,4C) (54) 
From (47), (51), (53) and (54) it follows that 

£Ln (5, — 0) — AG(u)}* V,}} > N(0, 4[G(w)}* C(w) [4(u) >). i 


Remark 1.1.2 The model error produces the correction term 2[G(u)]~1 V,, 
i.e. d:, is consistent, but for yn (5 — 9), we can not generally assume the vali- 
dity of a limit distribution with the expected value zero. The correction term 
does not occur if # is replaced by #f := arg min” |f — gol, (compare Bunke, 
1981). 


1.1.5.6 Special cases and related results 
Remark 1.1.3 Now we consider the special case of the homoscedastic (of = 0”) 


adequate model (f = g»,) and of the nonweighted LSE. In this case the weight 
function is w, = 1 and we omit the labelling w and wu, respectively, on the sca- 


44 Chapter 1. Parameter estimation and testing hypotheses in nonlinear models 


lar product ("(-, ZA eat | (as -)): 
d, = arg min Q,(8), Q,(8) = ly — goli- 
d€O 
Let the assumptions of Theorems 1.1.1 and 1.1.2 be satisfied. Then 
OQ, (In) > 0? 
and 
L{/n (Sn — 9o)} = N(0, 0°(k, k)-1), 


Pit eee oe 
OF | 9=0, 


This result corresponds to theorem 7 in Jennrich (1969). 


Remark 1.1.4 If & ~ N(0, o?) is valid, then the maximum likelihood estima- 
tor and the GLSE coincide and we have 


£L)n (Fe 3o)} = N(0, *o(k,k)-}) 


with w, = {o; *}. 
Thus, the covariance matrix of the asymptotic distribution has the form 


0g9(21) a\e aed) 
< ee 
an oo = ) 


Remark 1.1.5 It should be mentioned that the assumption AZ is rather 
restrictive. For instance, it rules out the special nonlinear regression function 


g(x;, 3) = (¢+ 8)* fora 21/2: 


i ie awa) eg COs) 
(ky k) = (tim ‘2 3 ae 


n—>0o oo 


Wu (1981) gave sufficient conditions for the strong consistency as well as for 
the asymptotic normality of the LSE which admit other growth rates than n 
for . 


D,(8, %) = a (o(21, 8) — glee 9)? 
=1 


to infinity. For details the reader is refered to that paper. 


Remark 1.1.6 We have assumed that the nonlinear part 6 of the regression 
coefficients has a compact range # in order to assure strong consistency. 
Recently, Lauter (1989) proved that the compactness assumption can be re- 
placed by the following conditions: 


For every % there are constants c(%)) > 0 and d(%) > 0 such that for all 3 
with |||] > d(%) and for all n, 


Y (ale, 9) — glee, Bo)? > et) 


1.1. Parameter estimation in nonlinear models 45 


Furthermore, the set of all parameters # fulfilling 
\[F|| S d() is compact. 


Remark 1.1.7 As is well known, the LSE is very sensitive with respect 
to ‘outliers’. More robust estimators were constructed by Huber (1964) within 
the concept of the M-estimators. For the nonlinear regression model Grossmann 
(1976) introduced, subsequent to these ideas, estimators which are the solution 
of the following minimization problems: 


n 
of : min my AWE or go( yt) > 
80 t=1 
where 
( 2 
cl D 
| 


forz2—ak 


k\z| —— fr |jz| 2k 
2 


( 


For these estimators the asymptotic normality and the existence of a k* for 
which 0" in (9 | kjo <k < kyo} has thesmallest asymptotic covariance matrix, 
is proved under certain assumptions. It is suggested to estimate this covariance 
matrix, which depends on the &;, and to proceed in two stages. 

So far we have provided some contributions to what is called first-order 
asymptotic theory for the nonlinear regression model. It is well known that 
first-order asymptotically efficient procedures can be poor for moderate sample 
sizes. Usually this is demonstrated for some special cases including numerical 
comparisons or simulations. In fact only the linear term of the Taylor expan- 
sion of the nonlinear regression function yields a contribution to the limit 
distribution of various test statistics. Thus, as long as first-order asymptotics 
are considered, one gets the same results one would get for the artifical lineari- 
zed model linearized at the true parameter, i.e. 


B) , 


neglecting the dependence of - g(x;, 0) on &. Therefore, almost no charac- 


teristics of the curvature of the regression function influence the statistical 
inference based on limit distributions (see Bates and Watts, 1980; Hamilton, 
Watts, and Bates, 1982; Amari, 1982). On the other hand, first-order asymptotic 
statistical procedures admit an approximation accuracy of O(n-¥/2) at most, 
which can be too poor for applications. Therefore, one would like to derive 
statistical procedures which hopefully can also be applied for moderate sample 
sizes with a suficient accuracy. Following the Pfanzagl school (Pfanzagl, 


46 Chapter 1. Parameter estimation and testing hypotheses in nonlinear models 


19732, b; Michel, 1975; see also Chibisov, 1972, 1973a, b; or Akahira and 
Takeuchi, 1976), the idea is to improve first-order efficient procedures using 
Edgeworth expansions for the statistics involved. In Schmidt and Zwanzig 
(1985) stochastic expansions as well as Edgeworth expansions of second order 
for the LSE and the residual sum of squares are used to construct sequences 
of tests, confidence regions and estimators which possess an approximation 
accuracy of O(n-1/2) and which are second-order asymptotically efficient if the 
observations are normally distributed. For instance, an estimator a; 
= 8,(y1, ---» Yn) for 0; is constructed which possesses the following properties. 
Let us denote by P,; the distribution of VY, = (y1,--., Yn)", € = (8, 07, x) 


1 
where x is the distribution of — ¢,. 
o 


(a) &; is asymptotically median unbiased of order O(n-¥/?2): 


P,(9; = 8) = - ~ O(n?) 
and 
PylS; <8) = - — O(n-12) 


both inequalities holding uniformly in 0 = P| from compact subsets 
of the parameter space. e 

(b) &; possesses greatest coverage probabilities for shrinking intervals in the 
class of all asymptotically median unbiased estimators of order O(n~1/2): 


Patt; — n-U2hp < 9; <3; + n-W2h} 
=> Pi (9, —n-V*h << 0, + h-VPh} + O(n-¥?) 


uniformly on compact subsets of the parameter space for all estimators 
9; = O(Y1, --.Yn) Which are asymptotically median unbiased of order 
O(n-2), 


Here the estimator 8; has the following structure: 


~ 


x 1 Re 
ov, = 0; + — K(O) 
n 


where K(@) is a bounded function of 6 on compact subsets of the parameter 


A 


range and K(@) is bounded in n. Moreover 6 = i is the composition of the 
6 


LSE and the residual sum of squares as well as §; denotes the LSE for B;. 

Notice that K(@) involves second-order partial derivatives of g(x, %) with 
respect to #. For details the interested reader is referred to the paper mentioned 
above. 


1.1. Parameter estimation in nonlinear models 47 


1.1.6 Asymptotic optimality 


In this section we will establish two different asymptotic optimality properties 
of the LSE. It will be shown that the GLSE and thus the two-stage estimator 
6, is asymptotic best WLSE in the adequate model, due to the comparison 
of the corresponding limiting covariance matrices. Furthermore, assuming a 
normal distribution for the observations, asymptotic optimality of the GLSE 
is true in the larger class of all asymptotically normally distributed estimations. 
There the normal distribution is necessary for asymptotic optimality. Then we 
will investigate the question under which condition the maximum likelihood 
estimator is asymptotically optimal with arbitrary distribution assumption. 
In Theorem 1.1.2 under the assumptions A,, Ay, As, A;, Ag, Az, Ag, 
Bu) = (ko, ke,) € Mr 


m+p? 
and 


n—>co 1 t=1 


Coe (tim Se aia (0)) em, 
i,j 


£40 (S_ — Io) > N(0, M(u)) 


with M(u) = B(u) C(u) B(u) was proved. If, analogous to the notation in the 
linear model, we introduce the matrices 


X,= : 5 2a = Ding [o;z;.-., a2] 


and U, = Diag [w, ..., Un], it obviously holds that 


noo 


1 af 
M(u) = lim S XU.X,) "5 LX SL ke ( NAY .) 
n 


= lim 10,2,14, (56) 


with L, = (X/i,U,X,)1 XU 
From the Gauss-Markov theorem (cf. Bunke and Bunke, 1986, theorem 2.1.1) 
for any matrix L, with L,X, = I, 
feel (Xe DX, \ 
and thus 
M(u) = lim n(X1,271X,)1 = %2(ks,, ks)? = M (wn). 


no 


That means the following theorem is true. 


48 Chapter 1. Parameter estimation and testing hypotheses in nonlinear models 


Theorem 1.1.3 Let W@ be the set of all weighting sequences w for which the assump- 
tions of Theorem 1.1.2, part 1, are fulfilled. Then M(w,) S$ M (u(w)) for all 
we Wifw,€ W.(Weuse A < B for two positive semidefinite matrices A and B 
iff B — A is positive semidefinite). 

The optimal weights 1/o7 are unknown in general and thus the GLSE can 
not be computed. But the two-stage estimator #,, has the same limiting distri- 
bution as the GLSE and is asymptotically best WLSE, too. As in the linear 
model one can ask here under which conditions the OLSE 


A 


5, = "d, with w) = {1} 


which can in general be computed more easily, is asymptotically optimal. 
Because of (56) and the results of Kruskal (1968) (see Bunke and Bunke, 1986) 
this is fulfilled for instance if for all sufficiently large n, 


holds. (2(X,) is the linear subspace in IR” generated by the columns of X,,.) 


Example 1.1.3 We consider an exponential regression model with repeated 
observations in two different points x), %2) € IR?: 


Y, = axe + &,, He? = Oy ECM =a, 


with 4h —- (a a a Ny}, Is — {ny s- itr seeg Ny, — No} 
nt Oe, by Be 
n 


n—>0o 


Then all assumptions from Theorem 1.1:3 as well as (57) are fulfilled and the 
OLSE is asymptotically optimal if « € IR! \ {0} and if 8 varies in a compact 
interval in the IR!. More generally, the OLSE is asymptotically optimal in 
repetition models 


Yt = Jo(Xj)) + &, Ee; = Oj) » Core 


with Y Pate PaaS |Z’; | =N;, pa Mig = {Dares 5570} 


and He 381 4= 1.5m + p 
n 
if A, holds. 


In the following we will define what we want to understand by a best asym- 
ptotically normally distributed (BAN) estimator. For this purpose we need 
a parametric description of the distribution family for the observation vector 
Y, = (Yt, +++» Yn)» Let x, be the distribution of «, and let % = (1, x, ...). 
Let # denote the set of all such sequences x; then the parameter ¢ = (8, x) 


1.1. Parameter estimation in nonlinear models 49 


characterizes the distribution of the sequence (Yy,, y2, ...). In the following let 
P,» be the distribution for the vector Y, induced by that distribution. 


Definition 1.1.3 A sequence {.,} of estimators, 
$,=4,(¥,) with — -£(Vn (&, — 4) | ¢) + (0, VO) 


as said to be a best asymptotically normally distributed (BAN) sequence of esti- 


mators for } if for each other estimation sequence {%,} with £ (Vn (5, — #) | ¢) 
—> N(0, S(¢)) it holds that 


V(c) S S(¢) 


for all 9 € ON N and for all x € K with ¢ = (8, x). Here N — O is a Lebesgue- 
zero set. 


Next, by using general results by Bahadur (1964, 1967), we derive a lower 
bound for the limiting covariance matrix of an arbitrary asymptotically 
normally distributed estimator. To do this we need some regularity assump- 
tions: 


Ajo The distributions x, of & are absolutely continuous with respect to a . 
o-finite measure yw. The logarithm 1,(&,) of a positive variant of the 
p-density of x, is twice continuously differentiable, 


(1% = — L,(e), P(e) = + ne) with 


(a) EUM(e,)} =0, BP (e,)} = —DYY(e)} > —o, 
(DI%E,)} <0o, t=1,2,:. 


(b) i!” is uniformly continuous, uniformly with respect to ¢ = 1, 2,... 


(c) for each ¢ there exists a function R, and a 6, > 0 with ER,(&) < oo 
and |l(&, + h)| S R,(é) for |h| S 6; 


(d) zs > 8, and se Y Dil (e,)} with s, = D{Uj(é,)} are asymptotically 
We | NW t=1 
bounded. 

Moreover we demand: 


Ai (a) g(a) is uniformly continuous with respect to t = Pe oe. sce OS 


+0. 


os max |l|ko(x;)l? 
N 1<t<n 


noo 


(b) Amax [Ko(a)] S ¢ < co with a constant ¢ not depending on é: 


4 Nonlinear Regression 


50 Chapter 1. Parameter estimation and testing hypotheses in nonlinear models 


Lemma 1.1.6 Under the assumptions A,, As5, Ayo, Ay, and Wz, Wo, 8 € W with 
s = {s,}, it holds for each h € R?*™ and for each inner point } of O that 


SAVIO Urea) et ara ga), oa) = h'I(£) h, h'T(C) n) 


with Cy = (8 +n Wh, x) and I(6) = %(ky, ky), and thus the sequence {Pro} ts 
contiguous to the sequence {P,;} according to result [A2.2], where L,(Yn, 6) 
denotes the logarithmic density of Y,. 


Proof. With 3, = 3 + n-V2h, we have: 
Ay: = TAs. es) an LXas ¢) 


= > [uly re go,(%t)) aa Liye Ta go(a))] 


=a 2 [Alen + Jo(X1) — Yo (&t)) _ 1,(e4)]. 


For U;(€; + Gin) with Qn := 9o(%1) — go,(%,) we use a Taylor expansion up to 
the second term around ¢, and obtain 


A, = => And) (&,) aes ~ Yaz,l( (& + A@in) =: An + Br 
t=1 
with At => Ai(€t) and [Ai < ls 
A further Taylor expansion of g» (x;) around gs(x;) yields the existence of 
real numbers 6, with |6,| < 1 and 


1 
A, = oe 5 h' lal %1) I (e;) = SoS Kae anit) hl (e;) 
n t= 11 N t=1 
=O pene 


Due to results from Bunke and Bunke (1986, theorem 2.4.3 and lemma 2.4.2) 
it holds that 


£(C,, | 6) >> N(0, 4'1(6) h), (58) 


since the conditions H{I{)(¢,)} = 0, D-C, ——> h'I(¢) h and 


2 n—>0o 


1 
max — h’ks(x;,) k»(x,)' h ——> 0 


are satisfied. Furthermore, D, ts 0 results from £-D, = 0 and 


D(Dn) = > Y (WK 54 5n-un(@e) h)? sy) S — ~ |! ae -s See Os 


1.1. Parameter estimation in nonlinear models 51 


Next we show 


1 
[pers —S WI(e)h, 


which proves the lemma. With 


1 
by, >= =— A'keg(x;) heo(a4)' h 
2n 


n 


iW nal 5h Keo (X1) bh’ K 5.5 5,n-a/2n(1) 


1 ! 
+ Qn? (h K 54. 5n-/n(Xt) h)? 


we have 


We show G,, — at | es h'l(C) h and F, Pay According to the definition 


of b,, it holds with 


1 n 
Gi, = Sit uh Tes (2,) Ko(a,)’ AUP (&,), 


Gen = D (2'K 54 5yn-210n(@1) 2)? Uy? (Et) 
tes 
and 
Gin = 3 h’K 5.5. 5n-2/2n(%1) he’ keg (ae) UY? (€1) 
nN t=1 
that 


G, ae Gin Gon Gign O 
1 ; 
G,, converges in P,; probability to xr h'I(¢) h since 


1 1 
1 are h' %(ko, ko)n kh — > —— WI (C)h 


n—>co 9 


4* 


52) Chapter 1. Parameter estimation and testing hypotheses in nonlinear models 


and 
DG =Te ys oa II {AI [? D-{1?(e,)) 
< see max [ho [ale — a Delle er 0 
n 
hold. From 
1 n 
E:Go, = —— » (2'K 94 64n-2/2n(1) h)? S&S ale ae = 2 So 0 
2n? t=1 2n 
and 
1 n 
DGizn Sarr na (h' K54.5,n-12n(X1) h)4 Del (e1)} 
4n* 124 


< + jn + -> De{l(e,)} ——> 0 
=1 


P, 
Sat) 


it follows that G2, eS 0, and analogously it can be shown that Gz, 
It remains to prove that 
nm 
= 2h fin(lt” (&¢ + Aydin) — UP (e )) 
with 
i; pha ERC ja : (h'K (a4) h)? 
ities on Bt) MO\t On? 9+ en PAR t 


1 
+ Tale h’kes (1) WK 5 + 54n-10n(%t) A 


converges to zero in P,; probability. 

Because of A,, and |A,| < 1, |A,a;,| is uniformly in ¢ smaller than any positive 
number 7 no matter how small 7 is if only n is sufficiently large. And, taking 
into account Aj (b), there is an m) for each 7 > 0 so that for all n > ny 


sup # (L{?(& + AQin) — l?)(&,)| 7) 


1StSn 
is fulfilled. The inequality 
LF, S sup Bs |UP(&, + Adin) — YP (€:)| Dd Ifenl 
t=1 


1StSn 
and 
e 1 
Weal 5 —S eA 
implies 
E,\F,| 0 andthus F,-“>+0. 


1.1. Parameter estimation in nonlinear models | 53 


In case % is arbitrary but fixed, Y, has a distribution in the family {Paz |¢ 
= (9%) © OX {29} with the finite-dimensional parameter # € 9. By Lemma 
1.1.4 and [A 2.8] we have for each estimation sequence {9,} with 

Ln (F, — 9) | (8, x)) > N(0, S(, %)) for all 0 € O 
that 

S(8, x9) 2 I-*(9, x) for almost all %. 


Tf, moreover {@,,} is an estimation sequence with 

L(x (B, — 8) | (8, x) > N(0, S(, %)) for al 
¢ = (8, x) € &, then this yields 

S(8, x) = I-1(8, x) for almost all @ and for all x. 
Therefore we have proved the following theorem: 


Theorem 1.1.4 Under the assumptions of Lemma 1.1.6 I-1(¢) = (*(ko, ky)? 
is for almost, all 3 a lower bound for the limiting covariance matrix of an arbi- 
trary asymptotically normally distributed estimation sequence. * 


Remark 1.1.8 If {9,} is an estimation sequence with ¥£ (Vn (D,, — 9) | t) 
> N(0, S(d, x) and if, for each fixed x, S(#,x) and I(#, x) are continuous in #, 
then S(f) = J-1(C) for all € € &. Namely, if YW —@ is a Lebesgue-zero set, 
then to each # € IN there exists a sequence {#,,} of parameters 3, € W° with 
On = 8 and we obtain 


m m—0o 


S(C) = S(S, x) = lim S(», «) = lim I-1(8,,, x) 


m—>co m—>oo 


= I7*(9, x) = 1X6). 


Now, from Theorem 1.1.4, the BAN property of the GLSE follows easily (and 
consequently that of the two-stage estimator #, and that of the OLSE under 
(57) and under the assumption of a normal distribution. 


Theorem 1.1.5 Under the assumptions of Theorem 1.1 .2, part 1, and of Lemma 
1.1.6, and «, ~ N(0, o?), the GLSE is BAN. In case the e are identically distri- 
buted, it holds with Cy = (8, #) that M(w,) = I-*(Co) iff & ~ N(0, o?). 


Proof. Because of Theorem 1.1.2, M(w,) = (ko, ks,)"+ is valid and under 


&, ~ N(0, 07) we have w, = {os = {s,} = s. In the identically distributed 
Of 
case, 


1 


= —— (ky, ky,)4 = I-(% 
Dey Crt (Co) 


M(w,) = o(kg,, ks,)7* 


implies D,,(1(¢,)) = 0-2, which gives with [A 3.13] & ~ N(0, 0°). i 


54 Chapter 1. Parameter estimation and testing hypotheses in nonlinear models 


Thus we have shown that the GLSE is a best asymptotically normally distri- 
buted estimator under normal distributon. But, in this case it is the maximum 
likelihood estimator, too, and it can be expected that the MLE, computed under 
an arbitrary distribution assumption, is always BAN. This will be proved 
in the following. For this purpose we first show the strong consistency of the 
MLE in Theorem 1.1.6 and derive its limiting distribution in Theorem 1.1.7, 
where we restrict ourselves to the case of the adequate nonlinear model 


y= Jo,(%t) + &, t= Le 2, ao, 


with 3 € O C R* and compact O. 


Definition 1.1.4 Each measurable solution &, of 


L,(¥n, Op) = sup L(Y, 9) 
90 
with L,(Vn; 2%) -> l (ye -- go(2)) is called a maximum likelihood estimator 


(MLE) for 3. As, by assumption J» 1s continuous in 3 and O 1s a compact subset 
of the IR*, there exists at least one MLE according to [A 3.1.}. 


Theorem 1.1.6 Let the following assumptions hold: A,, Az, As, Ag, Ayw(a), 
Ajo (d); 


S,Wy WW; 
go(x,) ts bounded uniformly in & and ¢; 


1 


co 
2 
hod 22 


we 


BUT; (&:) I < 00; 


_ 


for each sequence {d,} of real numbers d, there exists a sequence of measurable 
functions R, and a positive number c with 


Iz + d;) = RAz) forall z € R?}, 
ER,(&) S —ce, t=—1,2,... forac > 0; 


S-1 
p> Pp EL R,(&;)* < oo. 
A 


Then for each MLE %., we have 


a aS. 
Dn SSP Do. 


1.1. Parameter estimation in nonlinear models 


55 
Proof. The proof proceeds analogously to the proof of the consistency of the 
WLSE. Because of 


L,(¥n, 9) = Daly a tk (24) 


1 n 
o > az l} (8, + da) 
t 


=1 


with a; = g»,(x:) — go(x,) and 0; = 6,(&;), |d;| < 1 we have with 


1 n 
A,(9) = — Y adl}(&,) and 
B,() -= — >) ajl?(&, + Ja) 
2n t=1 
that 


1 
4 (Ln(¥n, #) — Ln(Yn, %)) = An(O) + B,(8). 


According to the strong law of large numbers A,(#) ——+ 0 results for each # 
foe} 2 


if the series }) —, converges. But this follows from Abel’s convergence crite- 
rionand ‘1 


1 n 
— sa; =* 


lg — 9o,\n moar *l9e — ¥o,| 
NM t=1 


With the analogous arguments used in the proof of Lemma 1.1.2 one shows 
that, moreover, 


sup A,(o) > 0. 


EO 


In the next step we derive an upper bound for B,() 


n 1 n " 
B,(?) = = > Cr | (e, + 6)a;) S Pie x a; Rie) 
f= t=1 


1 
= aes a; ( Re(er) = ER,(e:)) ie Arh — >: a LR (1). (59) 
2n t=1 i-1 
The first summand of the right-hand side of (59) converges uniformly in # 
almost surely towards zero (again as a consequence of Lemma 1.1.2), and the 
second summand is bounded uniformly in # for sufficiently large n by 
1 C 
a S a BR, (&:) = —> [ge 


Cc 
— go|, = —— |9e — 90,” 
2n t=1 i 4 


56 Chapter 1. Parameter estimation and testing hypotheses in nonlinear models 


Until now we have shown that, for almost all error sequences {¢;}, there exists 
an integer m = no({e;}) so that uniformly in #, 


il 1 
ar (L,(Y a; 8) — Ly(Yn,%)) S ree lgs — 9o,| (60) 


holds for all n = 7. 

Now let &,, be an MLE and {e;} a sequence with (60). Because of Bn(€q) «+ +€n) 
€ @ and because of the compactness of O, each limit point of {8,} lies in O. 
Let # be an arbitrary limit point and (F,,} a subsequence converging to #’, 


then it holds for n = np that 


1 1 
Os om ade ee Bn,) er LAY» 3o)) = ass ¢ \95,, me 9.0 i 


1 Pe 
pea carne go — go,|"- 


From this it follows that |g», — g»,| = 0 and with A, we obtain ® = %. 


Theorem 1.1.7 If in addition to the assumptions of Theorem 1.1.6 the assump- 
tions A, for the class KH, extended by the functions ky, - kp, 1,7 = 1,...,p 
and A, (b) for I!(&,) instead of &, are satisfied, then 


£(Vn (G, — 9) | ¢) > N(0, I-10) 


with I(¢) = %(kp, ka) holds for each MLE &:, and hence },, is BAN according to 
Theorem 1.1.4. 


Proof. Because of },, ——> 3 € O™ we can assume without loss of generality 
that &, is a solution of the normal equations 


1 n 
iy Wy, = 95,,(%t)) kg (x) = 0 


nN t=1 


Ms 
~~ 
Oe 
= 
| 
SS 
3 
5 
Se 
S 
5 
© 
Ss 
oF 
icv) 
=} 
© 
rs 
Ze) 
=) 
[e} 
no} 
ae 
io) 
eo 
® 
i) 
z* 
= 
= 
* 
=e 
Ss 
co) 


With B(3’) := a 
follows that if 


3 OD ( 9" x 
0=%5,)- 0042) 6,-9). om! 
Cf a EO 
First we show 
CD89") _ D(H") as. 6 (62) 
oy #=9* on” v=8 


Pa 


1.1. Parameter estimation in nonlinear models 57 


For this it suffices that, for each i,j = 1,..., D, 


1 n 
es py [2 (ue in 9 yx(©1)) Ke gx (1) Keon (2) — Ue) ko,(t) ko,()| 

1 By ee ' 
= FE (Pye = goplerd) — WCE) gg re) ge () 

i n 
a a x L?(e:) (,x(21) Ke gx (1) — ko (xt) ko,(24)) (63) 
and — 

1 n 
5 x [2 (ye ae 9 gx(%)) K gx (1) — UME) Ko ,(«)| (64) 


converge to zero almost surely. 
Because of the uniform continuity of J?) and 9* => # there exists for 
almost all realizations {e,} and for each 7 > 0 an ny with 


12 
oa > (1 (yi 3 9 y#(21)) — 1(e1)) kx (2) ke yw (1) 
Sn|k yal, [box | woe? 7 lho! Ihe 


for all nm = np. Thus the first summand of the right-hand side of (63) converges 
to zero almost surely. A repeated application of Lemma 1.1.2 yields 


us n 
Dd P(e) Kg (X) kg,(Xt) 


nN t=1 


cea 


t=1 


a 8 = NS Spkeg (21) ko,(2t) a+ —*(ke,, ks,) 


N t=1 


uniformly in #, from which 


n 
Dy (é:) Te gx (21) Kegx (1) ==+ —A(ky,, ko,) 


Slr 
p 
Ll 


results and hence 


DUPE) (hap) Kye (2) — eo, (4) ho (er)\ “+ 0.- 


t=1 


zlR 


58 Chapter 1. Parameter estimation and testing hypotheses in nonlinear models 


Now we consider (64). Because of 


1 n 
ms »L [20 (Ye im 9*(*1) K yx (%) — 1)(&) K5,(%))| =A, + B, 


t=1 

with 

Ay = — S (H(ue — ayqled) — 2M) Kye, 
and 

B, = A DE: 1 (:) (K, aX, « (x 7 K4,(%)) 
as well as 

|A,/? Ss * S (Pu aeG, ox (%)) — YP(e ))? |Ky2 «|, 
and 

© (Us = seqlee)) — WC]? > 0 


we get A, ——> 0. 
At the same time, B,, > 0 is true because, on account of Lemma 1.1.2, 


1 n 
7 DX UP (Es) (Ko(%1) — Ko(x:)) 


t=1 


converges uniformly in # a.s., which proves (62). 


ney 1 (&) Ko,(21) + 0 


nN t=1 
and 
1 n 
— DY PEs) ho, (x1) kp,(x;) 
 t=1 
1 n 
=—) (1?( ) l(€)) ko (%4) ke,(x;) 
1 en 
a+ —(ke, ko,) = —I;,(2) 
imply 
OD()') a.8 
; =r (C) 
OF lye 


and with this, by (62) also 


0B") 
a0" 


"+ —I(¢). (65) 


= 9" 


1.1. Parameter estimation in nonlinear models 59 


Furthermore, applying the results of Bunke and Bunke (1986, theorem 2.4.3 
and Lemma 2.4.2) gives 


£(Vn B(9) | ¢) > N(0, 1(0)), 


from which we conclude with (65) that 


£(Vn (8, — 8) |¢) > N(0, 70). 


Remark 1.1.9 Chanda (1976) also gave the lower bound for the limiting co- 
variance matrix as well as weak consistency and limiting distribution of the 
MLE and the GLSE (without proof). His regularity conditions can not directly 
be compared with our assumptions, e.g. he does not need the compactness of 
the parameter’s range, but on the other hand he assumes the existence of the 
third derivative of the regression function and the uniform boundedness of g» 
and of all its derivatives. 


1.1.7 Asymptotic results for estimators and tests of the variance 


In this section we consider the special adequate nonlinear model 
Yt = Go,(%t) + & p= 12, 
with 
g(x) = x'hg(x), 7 —=(%,p) ¢ IR? xX 
and independently identically distributed errors <;, ¢ = 1,2, ... In the follow- 


ing let A, (a), A;, Ag, A7, and Ag be fulfilled. 
We are interested in the asymptotic behaviour of the residual estimator 


D (ve —99,(e0)? = |y — 95 


In Theorem 1.1.4 strong consistency of 6? was proved by assuming the com- 
pactness of #, A; and A;. The following theorem provides the asymptotic 
normality of 6°. 


Theorem 1.1.8 £{)/n (62 — 02)} + NO, y) with y = Dé. 


Proof. Because of 
= Qutdn) = lela + os, — 95,]% + 290 — 95,2) 


it follows that 


Yn (62 — 0) = 


d ti 2 — o?) ) + Yn |go, — 
Vn (9, — 99°) 


a 
1 2 


60 Chapter 1. Parameter estimation and testing hypotheses in nonlinear models 
and thus the assertion is established if 
Yn lie yn lg ease 
converges to zero in probability. 
Applying Taylor’ s theorem to the first summand of (66) elds the existence 
of a O* with |}O* — || < |b, — Boll such that 
yn |g 0 4 F =2 Vn (I, — 9)’ (tgs, Jax — 95,),, (67) 


Fy 4.8. 
Because of &* ——+ # and 


94|, + 2 Vn (90, — p> é), (66) 


£{Y/n (5, — )} > N(0, 0%(bo,, ho,)) 
(67) impies that 
Vn |go, 


Furthermore, for the second summand of (66) it again holds, with a suitable 
S*, that 


(68) 


2 Vn (90, — 93> é). == 2 yn (,, — 9)’ (kgs, é), 
and for its convergence in probability to zero, 
P 
(Ege, é),, —>0 (69) 
is sufficient. (69) results from Lemma 1.1.2 and $* ==> 9). my 


Next we prove the asymptotic efficiency of 67. According to this aim P,,; 
is to denote the distribution of 


VY, = (Yi, <-> Ya) under: € =-(9; a. x) €C REX AOR? << hS a 


and we show that the sequence {P,;,} with C, = (8, 0? + nh, %) is conti- 
guous (cf. [A 2.2]) to the sequence {P,,;} for any positive h if XH fulfils some regu- 
larity conditions. For this we assume that x is absolutely continuous with 


respect to a o-finite measure mw, and by L(u) = log <2 (uw) we denote the 
1G 


logarithm of a positive variant of the u-density of U; := &;/o. 


Lemma 1.1.7 Under the assumptions: 


1. L is twice continuously differentiable w.r.t. u 


Lu) = —— Diu), Lu) = Lu); (70) 


1.1. Parameter estimation in nonlinear models 61 


2. EUL®(U)=—1,  — D(UL®(U)) = 1 — B{WL(U), > 0; (71) 


3. There are functions M and J: R1 > R! and a positive number 6* > 0 such 
that, for any 6 with |6| < d* 


|L®(u(1 + 6)) — L®(u)| < J(d) M(u) (72) 
is satisfied with H{M(U) U?} < c and J(6) => O it results that 


£(Lq(¥us Ex) — L(Y ¢) | t) > fae nI(C), #1) (73) 


with I(f) = Z; DIUL®(U)), where L,(Y¥,,¢) denotes the logarithm of the 
density of Y, with respect to pu. 


Remark 1.1.10 With [A 2.3], (73) implies the contiguity of the sequence 

{Pne,} to the sequence {P,,.}. 

Remark 1.1.11 The assumptions 2 and 3 are satisfied if uw is the Lebesgue 
u—>+00 


measure, p(w) = = (w) is positive anywhere and if wp(w) ———> 0 as well 


as u? -- p(u) ———> 0 hold. This can easily be proved by partial integration. 


uU—+co 


Proof of Lemma 1.1.7 We put t = o? and t, = o7. Because of 


Beast Be) | 


t=1 


=—Snlnr+ Ss L(U;) 


t—1) 
it follows that 


. 1 Tt 
L,(Y 5; En) = L(V» a) eS Da [L(U; oe V;) zz: L(U;)] + oy n In Ee 
t=1 n 
(74) 
with 
qil2 — 71/2 
a ae 72 U; 0 
A Taylor expansion around U; up to the second term yields 
UVa) a Age 
t=1 
s Dv L©(U,) + Le + WALO(U, + AV,) =: Aa + B (75) 


where 4, is a random variable withO <4, 31. 


62 Chapter 1. Parameter estimation and testing hypotheses in nonlinear models 


zs qu2 _ me 
Witha, = ta ee 
1 
gt Teh Ye ig et EO ECT ie ee ee 
2 Tn 7 t=1 2 Tn 
— 1h 1 eases 
and because of Yn a, ——>+ ——— and —nIn eee NA, aS 
i n—->co 9 T 9 ie n 8 72 
obtain 
ea i US 2 
oie tea ee CN (eee D{UL®(U)} (76) 
2 Ga S707 peer 
Now we show 
Bele : ue E(U2L®(U). (77) 
7 
i 1 1 Se eae 
Due to B, = — na® nee U?L®(U, + 4V;) and — naz, — > — — it is 
2 N t=1 2, 8 7? 
sufficient to show that 
+ S UPL U, + 4V;) 2 EPL) (78) 
i 
(78) is equivalent to 
ae *: 
— ©) U{LO(U, + AV,) — L(U,)) + 0 (79) 


NM t=1 


Because of (72), 4, < 1 and a, — => 0 it holds, for sufficiently large n, that 


oe 


 t=1 


1 n 
< max J(i,a,) — Py U?7M(U;) 
M t=1 


1Stsn 


Now we consider arbitrary but fixed parameters (#,x) € R? xX Bx Ko, 
where Hy denotes the subset of H with (70)—(72). Under this assump- 
tion Y, has a distribution in the one-parameter distribution family 
{Prot | 0? € IR*} with Pg: := Pave.c2,x). On account of [A 2.8] and (73) it thus 
follows for any estimator sequence {62} with 62 = 62(Y,) and # (Vn (G2 — o?)| o°) 
-> N(0, S(o?)) for all o? € IR* and fixed # and x that 


S(o*) 2 I(¢)-1 for almost all o?. 


As w.r.t. the family {P,,; | ¢ € &} the class of all asymptotically normally distri- 
buted estimators 67, is contained in the class of all asymptotically normally 


1.1. Parameter estimation in nonlinear models 63 


distributed estimators w.r.t. {Py,: | 6? € IR*} for any fixed # and x, it holds 
for any estimation sequence {62} with 


#(Yn (6 — 0%) |c) > N0, 5D), CE 8, 
that 
8(¢) = 1(6)- for almost all o? and for all (8, 2) <€ OX Hoy (80) 


Now we are able to prove the asymptotic efficiency of 6%. Let H* be the family 
of distributions with the densities 


Dm(u) = c(m) u?™ exp |—(™ + 5 «| (81) 
for arbitrary integers m with 
1 
= for m = 0 
2x 
c(m) = 
for m = 1. 


(2m + 1)m*12 
1-3...+(2m —1) y2n 


Simple computations yield HU =0, HU*?=1, HUL®(U) = —1 and 
D{UL®(U)} = 4m + 2 = 1 — E{U2L°)(U)}. 


Furthermore, 
|L(u(1 as 6)) = L®(u)| = ew =e Relves 
“a wii + 3)? 
2 
< ume ee —:J(8) Mw) 
u 


holds and thus J(6) => 0 and H{U?M(U)} = 1 < oo. Thus Xq is a subset 
of Ko. Additionally, for x € KG, 
4o* 2 


GF ee os AeA FIC Tape 
GT sit aisinatceame ae ame ape. 


holds and hence the lower bound (80) is attained for 67. 
Thus we have proved the following theorem: 


Theorem 1.1.9 The estimator 62 is BAN for all x € Kj. 


Remark 1.1.12 Under normality (x = N(0, 1), m = 0), 6? is MLE for o®. 
But, for all other x € Hj} the exact computation of the MLE is complicated 
even in the most simple case y, = 9 + &,t = 1, 2,...,, because of the non- 


64 Chapter 1. Parameter estimation and testing hypotheses in nonlinear models 


linearity of the likelihood equations 


2 | ah eee 
AC as Se = aude >y (ye am go(2,))? 2() 
2 Paes A 

1 
eneee 
mu rue AE _ (y = go(x,)) keg (a) == ((), 
t=1 1 (Yi ais (2) oF t=1 


Remark 1.1.13 The equality y = J(¢)-! can be obtained iff €, has a density 
of the form 


Pole) = (0?)~* Ale, c) exp ea 


with a positive constant c and a function h not depending on o?. 
Theorem 1.1.8 admits the construction of asymptotic «-tests (cf. [A 2.16]) 
for the test problem 


Hewa* 0, 8 agaist: A, .o°s= a, 


If x is quasinormal (y = 20+), then the test 


1 if Ze hes 
Pr Y n) = ; (82) 
0 otherwise 
apd 
with Z, = Yn (6y = 66) and u, the «-fractile of N(0,1) is an asymptotic 


V2 05 


«-test since under the hypothesis Hy of Theorem 1.1.8 it follows that 


lim sup Bep,(¥) 
yn 62 yn 2 
= Sn yeup Pe, — o*) + —— (0? — 09) = Us 
y2 ze 2 0% 


cine yo, ae ei 


n—>co V2 2 
2 


LS wae. rye pone 
Furthermore, V,, = = ae eats te 2 = 
V2 9% y2 % 
A(c) > 0 for o% > 63, there is a 6 = d(a) > 0 and an m = no(d) such that 


= :i(c) and because of 


S A(c) — 6 is satisfied for all n => ng. For n = ny this implies 


Ley (Yn) = Py Mee Ae 2 PrtV, S Xo) — 9} 


n 


= PaullVn — Ao) SO} 5 


and thus the test (82) is also consistent for fixed alternatives. 


1.1. Parameter estimation in nonlinear models 65 


In the next theorem we will derive the limiting eecveye of the test stati- 
stic Z, under local alternatives 


K, :0% = 0% + nV (83) 
with h > 0. 


Theorem 1.1.10 For all 6, = (8,02, x) with o% = 02+ nh and x€ Ky 
it holds that 


L(Zq|t,) > Ww (—*-,1). 
2 0 


Proof. According to Lemma 1.1.7 the sequence {P,;} is contiguous to the 


Prop Pere 
sequence {P,,} with Cy = (8, 05, x). Hence nj, acl 0 results from 4, LL 
and thus 


F 1 
£(/n (62 — 0) |6,) =£ {= Eee — 63) + mn | ) > N(0, 204) 
nm t=1 
With (84) and (84) 
3 \/n (62, — 0?) & h 


72 0 /2 0 


the assertion is proved. fl ; 
If x is not quasinormal, the test (82) can not be applied. In this case a con- 
sistent estimator 9, for y is needed. 


Y ((ye — 99,20)? — 62)? ts weakly con- 


t=1 
sistent for y in case that x has a finite moment of 6th order. 


1 
Lemma 1.1.8 The estimator 7, = — 
nN 


Proof. Because of 
‘ by 4 g (z;) 4 _ 6 
“ (ys 95, t ) n 


and 64 + o4, it is sufficient to show that 


3 
3 


1 
+ 6 es ya (go(a1) eetle (21)? g—4—) (go(2) a 94 (71)) & 


N t=1 z N t=1 
+= ¥ (aslo) — 94,(00) (85) 


5 Nonlinear Regression 


66 Chapter 1. Parameter estimation and testing hypotheses in nonlinear models 


converges to #,&} in P, probability. As this is the case for the first summand 
of the right-hand side of (85), we show that all the other summands converge 
to zero in P,; probability. 


n 


Because of a 2S (go(«) a 9, (a))* ae (Vx |go 


nN t=1 


yt ? and (68), 


~ 3 (gale) — 95,(a0))* 7% 0 (86) 
: 


holds. 


follows from 


2 (go(%:) — 9g (2) | S |g 


Next, (86) and 


1 2 1/2 n 1/2 
- — & (oat Xt) = 9s AC) es a » (go(@) = a5,00)') 2 DS e) 


n t1 t=1 
imply 

| a n 

= (Gola) — 9 (x)? €? —*+ 0 (87) 

2 t=1 “4 
and finally 

1 PS OSS 

Be (go(:) — 9% (x) &p > 0 

NM t=1 


results from (86), (87), and 


fe Y (gale) — 95, (a0)° 


t=1 


<a mh a bi »a\l? 
= (SE ned (g0( x) — 9g, (2t)) ] e ne (g0(%) = 99, (%1)) et) Ez 


2 t=1 
Lemma 1.1.8 and Theorem 1.1.8 imply that the statistic 


2 


fis oes Vn (0, =$r 9) 


Vin 


1.1. Parameter estimation in nonlinear models 67 
SE a A 


is asymptotically standard normally distributed under o?. Hence the test ' 


1 PS 0, 
Wn Yn) ie (88) 


0 otherwise 


is an asymptotic «-test and it is a consistent one if x has a finite moment of 6th 
order. With respect to the limit distribution of the test statistics under local 
alternatives (83), the following theorem is valid. 


Theorem 1.1.11 £(T,|¢,) ~ N (= ) for x € Ky. 
Y 


Proof. Because of T,, = Vy/¥, Z, and Theorem 1.1.10, it is sufficient to show 
that 


eo a at 
yy (89) 


(89) results from y cay y and from the contiguity of distribution sequences. I 

Moreover, the asymptotic local efficiency of the tests (82) and (88) is proved 
in Schmidt (1979). 

All results of this section are valid for the special case of the linear model 
gs(z) = «’x without the assumptions A;, Ag, A,, and Ag, (cf. Schmidt, 1979). 
Besides this, the rate of convergence of the distributions of 6? and of the test 
statistics to the corresponding limiting distributions has been derived in that 
case. 


1.1.8 Tests and confidence regions for regression coefficients 
In this section we restrict ourselves to the adequate model 
Yi, = Go(Xz) + & rl PR re 
with identically distributed errors ¢; 
Ee,=0, Ee? = o?, 0€OCR* 
and with 
8 = (9M, 9@’y 9 ERE 
we consider the test problem 
H:30 = 9 = against 3=§. K: 8) + Ht (90) 


that is, the question whether to many parameters have been incorporated 
into the model. 


5* 


68 Chapter 1. Parameter estimation and testing hypotheses in nonlinear models 


For this an asymptotic «-test can be constructed using the limiting distri- 
bution of the LSE as follows. If 3, = (0, 32)’ is the LSE for #, then we 
would reject H if a suitable normalization of the distance from 6 to d) is 


according to Theorem 1.1.2, because of 82 := 62 —»+ o? and of the 
continuity of B-1(-), 


Zn = 0S52(GY — 9)" BAL(G,) (OM — 99) 


is asymptotically central y?-distributed with k — s degrees of freedom under the 
hypothesis. That means the test 


1 if Zn = vee pad 
AOR : (91) 


0 otherwise 
is an asymptotic «-test for the test problem (90). Moreover, the following 


theorem yields the aysmptotic power of this test for local alternatives 


Kopp ey 


Yn 


Dy — (9), a ae 


with 


Theorem 1.1.12 Let h = (h®’,h®’)’ € IRE with h®e€ R* and & := = 
o 


x AM’ Br (do) h®. Under K,, the test statistic Z, has asymptotically a non- 
central y?-distribution with k — s degrees of freedom and the noncentrality para- 
meter 6* if the assumptions of Theorem 1.1.2, part 1, and Lemma 1.1.6 are satis- 
fied. 

Proof. Let €, = (#,, 07, x) and fy = (9%, o?, x). Because of the contiguity 


A Pat Pat i 
D, —> 9 and S? —-+ o* imply 


Pr, 
$6. —9, and Seo 
A Py, 
In particular, By 1(0-,) —~*> B-1(9,) results from this because of the conti- 
nuity of B(#). 


1.1. Parameter estimation in nonlinear models 69 


» Analogously as in the proof of Theorem 1.1.2, by Taylor expansion and using 
the contiguity one shows that 


Vn (Sn — Bn) = Bo) Vn (ko,, &)n + Op,y (1); 
which implies 

£(/n (Gn — 9) | Cn) > N(h, BU), 
and thus 


£(Bat(8,) Vn (92 — 8) | ¢,) > W(Bg%(9) h, o%ly_«)- 


; Pete : 
From this, because of S2 —»+ o? the assertion follows: 
Under fixed alternatives K: 9M + 94) 


Z, = nSz? |} — gaye 


BS.) 
Len = Py, 
= o* {fn (SY — 0) + Yn (0 — 62) 1,5 + o7,g(1) 2+ 00 
implies the consistency of the test (91). 


Under normal distribution and under the hypothesis one can prove (cf. 
Gallant, 1975) that 


i) 


k—s 


Py, 
TiVo pe arith 2, — +0 


and V,, has a central F-distribution with k — s and n — k degrees of freedom 
for any finite sample size. This could also give rise to the use of (k — 8) Fy_s psa 
instead of 7; ,.,. The resulting test is the usual F’-test in the case of a linear 
regression function. 

A further principle often used in statistics is the concept of likelihood ratio 
test (LRT) introduced by Wald. In the following we consider the LRT for 
the test problem (90) under normality and discuss some asymptotic properties. 
In the adequate model 


YY: =gela) +&, $€=—1,2,... 
with ¢, ~ N(0, o?), for € = (8, o”), let 
Pre aay N(9$ (a), o2I;) 


be the distribution of Y, = (y;,...,yY,). With 2g := {€ | ¢ = (9,07) € 5}, 
the likelihood principle consists in rejecting the hypothesis H if the value of 
the loglikelihood ratio 


Z i n 6°, 
Va EY faye loge 
7} nh) a> SH) (Yn $) 9 08 Ge 


70 Chapter 1. Parameter estimation and testing hypotheses in nonlinear models 


is to small. Here, as before, Z,(Y,,¢) is the logarithm of the density of Y,, 
under ¢, & = (8’, 6?)’, and éy is the MLE in the model restricted by H. Hence 
it holds that 


bn == (Hy, Oy)’ 
with 

Dn = (OP, 5PY, Fie = Qn(Dn), 
and $% minimizes the least-squares criterion Q,(89, 8) over all possible 
values 3), 


To fix the «-significance point we derive the limiting distribution of the 
test statistic —24 under the hypothesis as well as under local alternatives 


je eee | 


/n 


Theorem 1.1.13 Under the assumptions of Theorem 1.1.2, part 1, and Lemma 
DAG: 


£(—24/on) saa? te-o(9?) 


with Cy, = (8,, 0?) and 6? = ~: AO’ Bo (I) h®. 


(Hence the LRT is asymptotically equivalent to the test based on the limit distri- 
bution of the LSE.) 


Proof. Let 


0 a , , 
G= (;*-) € Messy xte-+19> h = (h’, 0)’ € Rt 


8+1 
and 

1 \ 
— BS) | 0 

o? ( 
(oy Pee = ea 
0 feels 
peers 


Under H, Y, has the limiting information matrix I(f,) := G'T (Co) G and as 
in the proofs of the Theorems 1.1.2 and 1.1.8 one shows that 


_ (62 — 9@ 4 Yn (ky, &)n . 
Vn ( meh G) Gal a ee ia ta Op(L). (92) 


63, — o? 


1.1. Parameter estimation in nonlinear models atl 


Because of 
LAY ;, é) ar DAY Cx) ae (Ga ee y CE ns 6) LY n, oUF ns ©) 
Oo leat 
bee pe OIAY a0) = | 
+ = Vm (§ — x) ar a ena (¢ — ox) 


with a suitable (ae 


GRAY OT aa 
Secrest. 


and with (92) we obtain, with T,, := Yn (— ¢)) and t = : (9, 0?) that 
—22 = Yn (§ — $n)’ Teo) Vm (6 — $x) + op, (1) 


= [T, — GI-4(60) G'I(£o) Tal’ L(S0o) (Ln — GL-4(E0) @'1(60) Tr] 
+ op, (1). 3 (93) 
Finally, the assertion follows from 


B(S) Vn (ko,> €)n 


7 x & — 0°) 
n 


and 7 (7, |'C,) > (h, Y he 1(£o)) as well as from the contiguity 


es = Yn a op, (1), Yn = 


L(V, | Cn) > N(h,I-(O)) 
and thus from (93). 
Remark 1.1.14 If we have no normal distribution, £(~n (§ —£)| t) 
—> N(0, I-1(¢)) with 


can be proved under suitable regularity assumptions as in Theorem 1.1.7 if 
¢ is the MLE for ¢ = (#&, o?)’. 


Thus in the general case, too, the y?-distribution is the limiting distribution 
of the likelihood ratio test statistic. This means that 


tit oA Sie, 


0 otherwise 


72 Chapter 1. Parameter estimation and testing hypotheses in nonlinear models 


can be applied as an asymptotic «-test. Its asymptotic power for local alter- 
natives can be computed according to (91). In particular, the consistency of 
the LRT follows under fixed alternative parameters, since according to (93) 
and with 


A= [I — I(f) GI-*(2) 4] 1(C) UL — (0) GIN) GY 
—22 = lyn (& — 6) + Yn (6 — bo) 4 + 07, (1) 22> 0 
holds. 
The LRT under normal distribution has also bee investigated by Gallant 
(1975). For finite n he proves that 


tigi 1 ead 


with nv Sant 0 and computes the distribution of V. With that distribution the 
x-fractile of the distribution of 67,/6? is approximated, thus defining an ap- 
proximate LRT. 

The question as to which of the two stated tests should be used can not 
always be answered uniquely. Monte Carlo studies (cf. Gallant, 1975) suggest 
the approximate LRT is preferable to the approximate test based on the 
normal distribution of the LSE if only nonlinear parameters are present. If 
linear parameters enter the regression function, then both the tests have ap- 
proximately the same power and for computational reasons one would prefer 
the test which is based on the approximation of the distribution of the LSE. 

Now we state some confidence regions for #. Because of the well-known cor- 
respondence between «-tests and (1 — «) confidence regions, 


(9 | (9, — 8)’ BI(S,) (G, — 8) < S272.,} (94) 


is obviously a confidence region to the asymptotic level 1 — «. Beside asymp- 
totic considerations here, too, approximations can be utilized for the finite 
sample size. They are based on linearizations. Let ¢«,; ~ N(0, o?). If gs were 
linear in #, then 


(n — k) (0,19) — O92) 
kO,(9,) 


would correspond to the loglikelihood ratio and would possess an F-distri- 
bution. For the general nonlinear case the F-distribution could be used as a 
first approximation and one would obtain the approximate (1 — «) confidence 
region 


(9| On(8) — On(Dn) < kn S2F yn psa} 


proposed by Beale (1960) (up to certain terms of correction). 


1.2. Switching regression models 73 


Eventually the form of the region (94) is to complicated. If a Taylor ex- 
pansion up to second order of the left-hand side is applied, then one gets an 
approximate (1 — «) confidence region which was proposed by Bow and 
Coutie (1956): 


(9 | (Fn — 8) OF (Sn) (Hn + 9) < Qhn SF yn p:4) 


” ae e 
P= ( a .(0)). 


Linearization of the function gy by a Taylor expansion of first order implies 
the approximate (1 — «) confidence region 


with 


(9 | (Fn — BY BoD) (in — 8) < USP nese 
proposed by Goldfeld and Quandt (1972). 


1.2: Switching regression models 


1.2.1 Introduction 


Most of the papers on regression analysis start from observation models 
Yt = f(t) + &, t= 1,...,n, 


with a regression function f(x) = h(x, x) with a constant parameter « for all 
points x from a given region %. But, often a cause-and-effect relationship that 
is to be described by the regression function f(x) = A(x, «) is not stable but 
subject to changes. 


Example 1.2.1 As is well known, the volume of a substance reduces under 
increasing pressure. In Eder (1968), a representation of the relative decrease 
in volume y = AV/V, is to be found for some chemical elements for pressures 
up to 9.80665 x 10° Pa (0 < x < 9.80665 x 10°). The curves (Figure 1.2.1) 
for the elements caesium (Cs), barium (Ba), bismuth (Bi), and antimony (Sb) 
have points of discontinuity, which are called pressure fixed points and can be 
determined experimentally. 

This suggests using for bismuth, for instance, a regression function of the 
form 

fa) =Aile,u), vacesy, 1=1,2,.-5, 


with yp = 0, y; = 9.80665. 


Example 1.2.2 The temperature-quantity of heat diagram of water shows the 
behaviour represented in Figure 1.2.2, where y denotes the temperature (°C) 


74 Chapter 1. Parameter estimation and testing hypotheses in nonlinear models 


0.7 
0.6 


0.5 
Cs 


Ba 


Bi 


Sb 


0) 2 4 6 8 10 
pressure (milliard Pa) 


Fig. 1.2.1. Compressibility of elements due to Eder (1968, p. 273) 


temperature (°C) 


boiling 100 
temperature 


melting 0) 
temperature 


| 
l 
| 
| 
| 
| 
| 
| 


0 05 W 15° 20-25 30:535 
W, (thousand J) 


Fig. 1.2.2. Temperature-quantity of heat diagram of water 


of the water (of the ice and the steam, respectively), and a the heat absorbed 
(J). Then the relation between y and x is described by a regression function 


Mm, + Myx, 0S Sys 
Cy, 1 StS, 
f(x) = 4 Ms + Nv, Ye SUSys, 
C2» ¥ Sets, 
Ms + N5v, i Se 


1.2. Switching regression models 75 


and because of continuity we demand 
M, + yyy = C1 = M3 + Ngyr2, 
M3 + NgVz = Co = Ms + Nsyq. 
Hence we speak of continuous changes of states. 


Example 1.2.3 In semiconductor physics it is well known that the conduc- 
tivity x of a semiconductor having imperfections changes strongly in depen- 
dence on the temperature t. The behaviour of In x depending on x = I/t is 
described according to Paul (1974) by Figure 1.2.3 and consequently by a 
regression function of the form 


f(x) = hia, CAE Mau Stsyvir i= 1, 2, 3, 
with yo = 0, where because of continuity 
Ay (v1, %1) = holy, Xe) and he(Ye, X2) = hg(7Y2, x3) 


has to be satisfied. 


=Inz 


Wf 


Kane 


Fig. 1.2.3. Curve of conductivity in dependence on the temperature ¢ (due to 
Paul, 1974, p. 237, fig. 4.15) 


In chemical engineering it is well known that the properties of objects are 
often subject to long-term changes. The reasons for this fact are uncontrollable 
disturbances such as changing activity of a catalyst, the ageing of the equip- 
ment, impurities of the raw materials, or atmospheric influences. Examples 
and some mathematical models applied in chemical engineering can be found 
for instance in Borodjuk and Lezki (1977). 

Further examples of models with changes of state from quality control, 
chemistry, biology, agriculture, water supply, astronomy, and economy are 
given, e.g., by Barnard (1959), Dunicz (1969), Sprent (1961), Gallant and Fuller 
(1973), Bacon and Watts (1971), Schulze (1977b), Poirier (1973), Fair and 
Jaffee (1972), and McGee and Carleton (1970). 


76 Chapter 1. Parameter estimation and testing hypotheses in nonlinear models | 


In order to cover the case of a change of the regression function in de- 
pendence on outside conditions, we introduce the variable z, where we suppose 
z € [a, b]. In general, we consider (observation) models of the form 


Ys = F(x, 2) + & (1) 
with 
f(1; 21) = Jo(Xt, 21) ae hilar, Os if Yin <% Ss OAs i= 1; sony Ty 


Vou Os Ve 0y Oe = (Os a ay ieee ea) 


and independent random observation errors ¢, with the expectation He, = 0 
for all ¢ and variances 


Dep = 67== 6), Un Vag i ee ee 


Depending on the variable z, the model changes from a state 7 characterized 
by the state function h,(-,-) and the state parameters («;, 0?) to a state i + 1 
characterized by the state function h;,,(-,-) and the state parameters («;,1, 07 eal) 
The change takes place at the change point z = y;. The variable z causing the 
change of state affects the regression function and the distribution of the 
observation errors. The distribution of the errors is arbitrary and z affects the 
variance in the manner mentioned above. Especially z; = ¢ can denote the 
time; then the change of state depends on the time. 

But also z; = x, or 2; = k(x,) can hold and the change occurs in dependence 
on the variable x, as in Examples 1.2.1 to 1.2.3. The literature often treats 
models with z = x € JR! and the condition 


hilyis o%;) = hiss Oi41)> d= 1,..:,7 — 1, 


which guarantees that the state functions are continuously connected (Examp- 
les 1.2.2 and 1.2.3). These models we call models with continuous changes of 
state: 

The state functions h;(x,, «;) are assumed to be known up to their state 
parameter «;, where the «; may be linear or nonlinear parameters. The state 
parameters («;, of), ? = 1,..., 7, and the change points 7,, ..., y, are unknown. 
Except from Section 1.2.6 we assume the state number 7 to be known. 

We aim at presenting a survey on methods of estimating the state para- 
meters, the change points y,,...,7;_; and derived parameters, respectively, 
and on tests to check the stability of the regression function. Many papers 
dealing with these problems have been published: Quandt (1958, 1960, 1972), 
Sprent (1961), Robison (1964), Hudson (1966), Hinkley (1969, 1971), Farley 
and Hinich (1970), Goldfeld and Quandt (1972, 1973), Gallant and Fuller (1973), 
Poirier (1973), Brown, Durbin and Evans (1975), Feder (1975a, b), Schulze 


* yy < 2 is always read as yy Sz; 


1.2. Switching regression models 77 


(1977a, b). Nevertheless, we have critically to establish that most of the me- 
thods are based on heuristic principles and many theoretical questions are still 
open. This is especially true for distribution statements on tests for checking 
the state stability, and for investigations and the comparison of the power of 
the different tests. In applying the least squares method, for a lot of models 
practicable algorithms for determining the estimates are still missing so that 
we often have to use approximate estimates. 

This section mainly represents a review with some complements from the 
theoretical point of view on the methods applied in models with changes of 
state,.where several open questions are referred to. First we are going to 
consider models with known 2, ...,2,. We treat the weighted least squares 
estimate of the state parameters «;, 7 = 1,...,7, and of the change points 
Yi» +++» Y, and inquire into some test problems occurring in models with changes 
of state. While Section 1.2.2 deals with the general model (1), Section 1.2.3 
is entirely dedicated to models with continuous changes of state. In Section 
1.2.4 some asymptotic results are summarized. In particular we discuss suffi- 
cient conditions for the consistency of the weighted least squares estimates 
in the adequate as well as in the inadequate model and their relation to the 
conditions described in Section 1.1.5. 

In all sections special emphasis is placed on models with linear state func- 
tions h;(x, «;) = «hi(x), 7 = 1,.. .,7, and on models with linear state functions 
which differ only in their parameters «;: 


hi(x, «;) = «jh(x), Bi cacoal (ibe Shea a 


In Section 1.2.5 we consider briefly some more special models with changes of 
state traeted in the literature. Among them are models with random changes 
of state. Finally, in Section 1.2.6 we refer to some methods to identify changes 
of state in models with an unknown state number r. 

Let us introduce some general assumptions and notation: 


x,€ X CRY, 2, € [a,b] =:Z—R! 


a; € A; CR", m= lb ope 


hj(-,-): & XR? > R}, eae eae ae 

yi € [c, d] — [a, b] =: Z, Diels Ta Oy Ge aaa b; 
Vi, os 4+) Prat) Cl = (an --o Yr-a) (Mi [dh via V 
E=2 5.0.7 —1}. 


The vector y of the change points we will also call the change point. : 
Tf & = (24, 21), +++» (ns Zn)) denotes an experimental design with the points 
(21, 2), t= 1,...,”, then I* denotes the set of all admissible change points 


78 Chapter 1. Parameter estimation and testing hypotheses in nonlinear models 
Y = (V1) +++» Yr_i)’ Satisfying the condition that there exists at least one ob- 
servation for each state, i.e., for each 7 = 1,...,7 there is a ¢ € {1,..., nm} with 


Vi-a < 2% < Vi (Yo := G, Yr := 4). 


Then 
O:= (a, y ECA x =: 6 


and 
Tr 


Jo(a, 2) = 2 h,(x, «;) Ly, ya(2)- 
i=1 
With regard to the variances we assume o = (0;,..., 0,)’ € (IR*)’. Moreover, 
WO MSS SY. 0i0s Ya eg Eat = (Sp sins FER) 


1.2.2 Ordered models with abrupt state switching 
1.2.2.1 The model 


Let us assume there are yj, ..., y, independent observations for (1) according 
to an experimental design § = ((2, By ys ewe hae Zn)) satisfying without loss 
of generality the condition z, S 2 <... <2,. In this case we use the term 


‘ordered model’. Let y = (71, ---; ¥p-1)’ € J*. Then there exist natural numbers 
Mm, = my) < ... <M, = m(y;-1) such that 


y= hy (x4; X) + &; with Dé, = Oo; fori if eeey My, 


Yt == Ng( Le, X) + & with Dé, — Oo for ¢ = m + Ae; Mg 5 (2) 


Y; = h, (x4, %r) + &; with De, = o? for t = m,.. + 1,...,” 


and 
He; = 0, eee ee 7 


The change of state always takes place between the points (%,,,2,) and 
(%m,+1> %m,+1)> Where m; is the number of the last observation uf the ith state. 
Thus, the m;,7 = 1,...,7 — 1, are defined by 


m, = m(y;) = max {t| a S yi} (3) 


and they are called change indices. The vector m = m(y) = (m(71), - +05 (77-1) 
we will also call the change index. 

Obviously the change index m(y) depends on the choosen experimental 
design §. For fixed experimental design, each change point y € J* uniquely 
determines a change index m(y) given by (3). Hence we can define the set of 


1.2. Switching regression models 79 


admissible change indices 
AM = mI") = {m(y) | y € I}. 


Ay (5 ey Ye) CT with em, SVi <2mar, t= 1,...,7 —1, have the 
same change index m = (mj, ..., m,-,)’. Generally, for a given experimental 
design é, the parameter y € I‘ will not be uniquely defined by the expectations 
H(%, 2), ¢ = 1,..., n. But, often at least the change index m = m(y) is uniquely 
determined by the experimental design, as the following example will show. 


Example 1.2.4 Let 


be, ey 
= fee oe 


y € [1, 3/2], 
Gz + box, y< a, 


1 2 20 
= — — —— j —— 
and é (3): Go) sae fa) .» Then it. follows’ that — = [1, 3/2]; 
M* = {10, 11, ..., 15}, and y is not uniquely identifiable from f(x;),f = 1, ..., n. 
But the change index m = m(y) € .@ is uniquely determined if only at least 
one change of state takes place, i.e. (a,, b,) + (ae, 62) is valid. 


Obviously, the vector of the values of the regression function on the experi- 
mental design & and the covariance matrix of the vector of all observations 
y = (Y,, ---» Yn) depend on the parameter y only via the corresponding change 
index m = m/(y). So we can write 

(x1, 21) a Jo(Xt, 21) =e Jays 21) Ses I(a,m(yy)(Xt> Zt), t= 1, sony N, 
and 

L(y) = Z(m(y)) € Via» 
where 


Ue = {2'(m) od Diag [opt n,3 Olea sey Olen | o 
= (0,,...,0,)' € (R*)}. 
Under the assumptions made so far we have 


Yt = Ioa,my(Xt> 2) + > t= eect; He=0, De = X(m) (4) 


with « € A, X(m) € Um, m € MH. 
In the case of linear state functions h;(x, «;) = «jh,(x), we introduce the 
r-block matrix 


H(m) = Diag [H,(m), H,(m), ...» H,(m)] 


with m = m(y), where the ith matrix H;(m) denotes the design matrix with 
the rows hi(x;), = mia +1,..-, mi, (mM := myo) = ma) = 0, m, = my) 


80 Chapter 1. Parameter estimation and testing hypotheses in nonlinear models 


= m(b) = n). The model (4) has the representation 


y=HH(m)ao+e, He=0, De=X(m) H 
(5) 
aeA,  3m)eEVm, me AM. 


‘For linear state functions «;h;(x) we always suppose (if no other assumption is 
made) that «; € 4; = R” and « € 4 = R?, respectively. 

If the change index m is known, (5) describes a linear model. But if m is 
unknown, it is clearly a nonlinear model. The demand that given an experi- 
mental design é for each m € @ the state parameters a, ..., x, are uniquely 
determined by f(2;, 21) = 9o(%, 2) is equivalent to 


r[H(m)] = p for all m € 4; (6) 


hence it is required that « is identifiable in each model with fixed m. This is 
equivalent to the existence of an unbiased linear estimate of « in the model 


y= Him) n+ 2, He—0, Ds = Xm), «RP, Bin) € Va AD 


with fixed m € 4. 
In the model from Example 1.2.4 the condition (6) trivially holds. 


1.2.2.2 Least squares estimators 


A weighted least squares estimator (WLSE) } = (4',9’)' is defined to be any 
solution of the minimization problem 
“ly — gal, = min “ly — gol, 
BE OE 
with 


"ly — gol, = 21D wi(yr — Jo(%t, z))? 


t=1 
(cf. Definition 1.1.1). 

Concerning the weighting w, = w(z;) we assume that it depends on the change 
point y only via the induced change index m(y). This is fulfilled in the case of 
a GLSE (w; = 07? for m(y?_,) << tS m(y’), i = 1, ..., 7) and an OLSE (w, = 1). 
Then also 


Tr mM, 
“ly —gola = 22>, L wilye — Ailes, ;))? (amo := 0, m, = nn) 
i=1 t=m4+1 
depends on y only via the corresponding change index m = my), and the 
WLSE @ is not uniquely determined. Let m = (m,, ..., 7,_;)' be a WLSE of 
m = m(y); then each p = (f,,...,Pr-1)' € L* with m(p) = m is a WLSE of y. 
The condition m()) = m is equivalent to }; € [Zm,, 2a)? = 1,...7 — 1. 


1.2. Switching regression models 81 


A WLSE # = (4’,’)’ can be calculated in two steps. First for each fixed 
m € M* the WLSE &,, := (&{(m), ..., &;(m))' as a solution of the minimization 
problems 


m, mM, 
ene dye ned h(x, &,(m))}? a oy a wey —hi(x,, oxi))? = :8;(m) 


and the corresponding residual sum S(m) := y S,(m) = n° |y¢ 
t=1 
lated. Afterwards an 7 € M* is determined providing a minimal residual sum 


with respect to m€ MAM. (&, mh’) and (84,9), 9 = (Pi, -+) Dra)’ with 9; 
€ [Za Za)» = 1,...,7 —1, are WLSEs for («’, m’)’ and 3 = (a’, y’), res 
spectively. A WLSE $ exists if &,, exists for all m € M>. The existence of &,,’ 
is for instance ensured if the state functions h;(x, «;) are continuous on X X 4; 
and if the 4; are compact sets. 

Since the number of elements in A‘ in even simple models is mostly very 
great, the determination of m causes much computational effort. Moreover, in 
the case of nonlinear state functions h;(x, «;) iterative techniques for deter- 
mining the &;(m) are used. In the case of linear state functions h;(x, «;) = «;h;(x) 
with «; € A; = R®, &, = (&{(m), ..., d;(m))’ exists for all m € M* and hence 


mil, are calcu- 


Yi(™) = (Ynys +++ Ym,)’ and Wj(m) = Diag [Wm, +1) +++ Wm,I- 


In the case r[H;(m)] = p;, 1 = 1,...,7r, and w, > 0 for all ¢=1,...,”, &m 
is an unbiased estimator of « in the model (7) with fixed change index m. 

The computational effort for determining the &;(m) and m can be reduced 
by using ee techniques. For models with linear state functions «;h(2), 
i=1,...,7r, Guthery (1974), starting from Bellmann’s optimization principle 
and using the updating technique, suggested a dynamic program to compute an 
OLSE #$ which reasonably reduces the computational effort. 

For application and for asymptotic investigations it is useful to assume lower 
bounds for the state lengths y; — y;-;, ie. to demand that y; — yi. 2 6; > 0 
for given boundaries 6;,7 = 1, ..., 7, and to introduce the corresponding para- 
meter spaces 


ee ey ens ot) 6 he Yr Vine aay eae | amid BY 
M:= Yel lyi—ya2otH=L.,.7 Hol, 
0,:=AXT, 50,8 := AX cA X=: 6 
ME = {mly) |v € 3} CAE With Gn ==t1Ogsie: 507) 


For 6 = 0, 65 = 0, My = MM’. 


6 Nonlinear Regression 


82 Chapter 1. Parameter estimation and testing hypotheses in nonlinear models 


The WLSE with respect to @§ can also be calculated in two steps. It has 
properties similar to those of the WLSE with respect to O°. We will denote 
it by 8°. Formally, $ = 8° holds. If max 6; < min {|z,; — 2.4! | 2 + 24} 

iSisr 2StSn 
is valid, it obviously follows that @* = O§ and consequently § = 8°. 

Since we do not know the true state lengths y? — yf, (é= 1,...,r), we 
want to choose the 6;, ¢ = 1, ..., 7, as small as possible. On the other hand, 
large 6; may be of advantage for the numerical determination of 8°. If one 0; 
is choosen too large, ie. if y? — yf_, < 6; for an 7 € {1,..., r}, then the true 
parameter y° does not belong to the parameter space J}, and the model be- 
comes inadequate. 

In Section 1.2.4 we shall refer to sufficient conditions for the consistency of 


and } (Theorems 1.2.1, 1.2.2 and Remarks 1.2.3 and 1.2.4). 


1.2.2.3 Testing the presence of a state switching 


First we want to compare the states in two groups 4 and B of observations, 
where we assume for simplicity that 
7 


y: = a'(A) Ae) + &, ted, a(A) € IR, 
y =a (Bh(m)+&, te B, a(B)eR*®, 


with independently V(0, o*) distributed s,. Hence, the regression functions 
(state functions) of the groups 4 and B have the same form and differ at most 
in their (state) parameters «(4) and a(B). These assumptions lead to the linear 


model 
Ya H(A) i 0 ' x(A) 
y = |--] = |----j—-—--] |----- +8 (8) 
Ys O : AB) \ a(B) 


with ¢ ~ N(0, o7J,), nm = n(A) + n(B). 

For any set 4 let n(d) denote the sample size of the group 4, y, the vector of 
observations y; from A, and H(A) the design matrix of the group 4 with the 
rows h'(x,), § € A. 

A change of state occurs iff a(d) = a(B), and the test problem concerned in 
the model (8) is 


H: «(A) = a(B) against K: «(A) + «(B) 
or equivalently, using «a = (3) : 
H: (Ip,i — Ip.) « =0 against K:(I,,} —Ip)) «+0. (9) 


This test problem is testable iff 
Pp, = 7[H(A)] = 7[ A(B)] (10) 


— a | 


1.2. Switching regression models 83 


The theory of testing linear hypotheses (cf. Bunke and Bunke, 1986) provides 
the test statistic 


FU, By) — MA) +B) — 2p, 
Pi 
. H(A) (A) — H(A) (A u B)|? + ||H(B) &(B) — H(B) aA u B)IP 
Ya — H(A) &(A)|? + |lys — A(B) &(B)\? 


Fyn A)-+0(B)—20, > (11) 


H 
where &(A), &(B) and &(A u B) denote the OLSE based on the observations of 
the group A, B and the union A u B. |lz|| is defined as the Euclidean norm 
llell = (22)? 

The notation ‘q’ used in (11) means that under the hypothesis H the con- 
sidered statistic has the given distribution. The hypothesis H is rejected if 
F(A, B) = Fy, 4) 40B)-2p, 18 valid. The «-test constructed from (11) is 
equivalent to the likelihood ratio test. Under the assumption (10) the statistic 
testing (9), which was given in Bunke and Bunke (1986, p. 269), coincides with 
the statistic F(A, B) in (11). If the condition (10) is fulfilled for all sufficiently 
large sample sizes n(A) and n(B), and if 

Aminl[H"(A) H(A)|aqsurO = and = Amin H"(B) H(B)] aaa? © 
then according to Bunke and Bunke (1986, Theorem 5.2.4), the sequence of 
tests basing on (11) is consistent. Chow (1960) and Quandt (1960) suggested 
further tests for checking (15) based in intuitive principles. 


Remark 1.2.1 Here we admit different variances o7(A) and o?(B) for the 
observations in A and B, respectively. If we consider the test problem H: «(A) 
= a(B) against K: «(A) + «(B), i.e. if we are only interested in the change of 
the parameter of the state function, then we have a generalized Behrens- 
Fisher problem. For the special case h(x) = 1 it is identical with the Behrens- 
Fisher problem. Toyoda (1974) and Schmidt and Sickles (1977) investigated 
the distribution of the statistic F(A, B) in (11) under H: «(A) = a(B). A 
compact representation of this distribution was not found. Schmidt and Sickles 
derived an exact formula for the probability P{F(A, B) = f}, which depends 
in a complicated manner on the ratio o?(B)/o?(A). They especially showed that 


P{F(A, B) 2 Fra sp..maytnBy—2p) < % (12) 


may hold for «(A) = «(B) and o?(A) + o(B) and thus the level of significance 
« is not kept exactly. 

Yayatissa (1977) constructed a statistic having an F-distribution under 
a(A) = o(B) and o%(A) + 07(B) so that the test based on this statistic exactly 
keeps the level of significance. On the other hand, we have a change of state 
in the model (8) with variances o?(A) and o?(B) iff «(A) = «(B) or o*(A) + 0°(B) 


6* 


84 Chapter 1. Parameter estimation and testing hypotheses in nonlinear models 


holds, hence it might be useful to consider the ‘extended’ test problem 
H: «(A) = «(B), o%(A) = o%(B) against K: «(A) + o«(B) or 
o%(A) ++ 0B). (13) 
Under the extended hypothesis, the test statistic F(A, B) of (11) has the F- 
distribution given above so that it provides an exact «-test for checking (13). 
Because of (12) it is indeed not unbiased. 

Now we return to our observation model (5), where we assume the «; to be 
normally distributed. We restrict ourselves to models with at most r= 2 
linear state functions h,(x, «;) = oh(x), i = 1, 2. If the change index m is 
known, then the problem 


H: There is no change of state during the observation t = 1,...,” 
against 
K,: The change of state takes place exactly after the mth observation 


can be checked with the test statistic F(A, B) (11) by setting 
Ae and B := {m+ 1,...,n}. (14) 


The resulting test statistic we denote by T'n. 
Usually just the change index m is unknown and the test problem which is 
of interest is 


H: There is no change of state during the observations ¢ = 1, ..., 
against (15) 


K: There exists in m € * so that the change of state occurs exactly after the 
mth observation. 


Then obviously K = U Ky, holds. 
mem?® 
Under the assumption 


r[H(m)] = 29, for all m€ AM: (16) 
the likelihood ratio statistic 2 for testing (15) is 


AS A Aa 
memé 
where 2,, and m denote the likelihood ratio statistic for checking H against 
K, and the MLE of m, respectively. The distribution of Az is unknown. 
Empirical investigations by Quandt (1960) showed that —2 In 2, asymptoti- 
cally has no central y?-distribution for n — oo. 
The application of the union-intersection principle proposed by Roy (1953) 


1.2. Switching regression models 85 


provides a further possibility for testing (15). It leads to accepting the hypo- 
thesis H iff all individual tests for checking H against K,, accept H. Under the 
assumption (16) this principle leads to the acceptance of the hypothesis / iff 


re De a 3P1, nm A)+n(B)—2p, 
meMm= 


holds, where the statistics T,,, are defined by (11) and (14). It | M*| denotes the 
number of elements in M', then |M| « is an upper bound for the level of signi- 
ficance of this test. 

Referring to the above property of the likelihood ratio statistic, Quandt 
(1960) proposed to check the problem H against K = U K,, with a test that 

meme 

is used for testing H against K,,,, where mp is to be ee ‘properly’. He 
proposed to use my = mm (m is the MLE of m), or 


[> if n is even 
2 
My = 
—1 1 
He epee ’ if nm is ood. 
2 2 


Some considerations concerning the goodness of such tests are to found in 
Quandt (1960). There are no general theoretical statements on the power. But it 
is clear that the power heavily depends on the distance of the used mp) from 
the true change index m°. 

Other tests, which are independent of the choice of an mo, can be obtained 
by the following idea. The test problem (15) has the representation 


Hf: y com Ho re €,€ ~ N(O, ol.) Oy € IR?:, o € IRt* 


against ee 
K:y = H,x, + ZB + &,@ ~ N(0, o7S,), «, € R”, 
B = (a — o%) € IR” \ {0}, oF € Rt, 


ZEZ(M*) := ee Im € a, 
2 


where H,, denotes the design matrix under H with the rows h'(x,),¢ = 1,...,m, 
m + 1,...,n, and H,(m) the design matrix with the rows h’(x;),t = m + 1,...,n. 

We propose to check the problem (17) with the tests for model testing des- 
cribed, e.g. in Ramsey (1969) and Thalheim (1977) (RESET — Regression 
Specification Error Test, RASET — Rank Specification Error Test, KOMSET 
— Kolmogorov’s Specification. Error Test, BAMSET — Bartlett’s M Speci- 
fication Error Test). These tests are used to test regression specification errors 


86 Chapter 1. Parameter estimation and testing hypotheses in nonlinear models 


of the form described in the alternative K in (17), but where Z € May p, \ {Onxp,} 
is assumed, i.e. these tests are used to check the hypothesis H in (17) against 
a greater alternative than XK in (17). Besides deviations from the model by chan- 
ges of state, this greater alternative also includes other forms of deviations 
from H in (17), e.g. the lack of essential variables in a stable model. If those 
tests lead to a rejection of the hypothesis H in (17), it can only be concluded 
that we have a deviation from the model, but without additional information it 
can not be concluded that this deviation from the model consists in a change 
of state. Since the mentioned tests are not specific for models with changes of 
state, we renounce a detailed representation. We only refer to Thalheim (1977), 
where also the necessary tables are contained. The tests obtained in this way 
are based on — in a certain sense — optimally linear unbiased residuals 
(Thalheim). 

Two further tests for checking the constancy of a state, which are based on 
normed recursive residuals, where proposed by Brown, Durbin, and Evans 
(1975). They started from the model 


Yr = ah(xy) + &, & ~ N(0, of), t= 1,...,” 
o, € R?, of € R* 


which allows changes of state after each observation. They developed so-called 
cumulative sum tests to check the hypothesis 


EPs te oe eet Pipe eee Sige 3 
Bi = 6 50, Opi Oper ce 
These tests are based on the normed recursive residuals 


yr — (&(t — 1))' h(a) 
vo, = —__,, Gf Me bn antOp 
Lick (h(a) [H7_1H,_,}* h(x) 


which are independently N(0, o?)-distributed under the hypothesis. Here 
&(t — 1), (&(¢ — 1))’ A(a,) and H,_, are the OLSE of the regression coefficient, 
the prediction of y, from the first t — 1 observations and the design matrix 
with the rows (h(2))’; i= 1,...,t — 1. The matrices H,_, are assumed to be of 
full rank. 
Let 
n n 

S(n) = YO (yi — (&(m))' W(x)? = Xv? and 6%(n) = (n — p)-2 S(n) 
=1 t=p+l1 
be a consistent estimate of o?. The simple cumulative sum test is based on the 
random variables 


2 Vr» (ia Nr eel i OM 


1.2. Switching regression models 87 


and the quadratic cumulative sum test on the random variables 
0=—— 3 
=— > vv, == eer (88 
r s(n) ot t Pp “ p) 2” 


The hypothesis H is accepted iff 


|W,| S 2a(r — p) (n — p)-¥?2 + a(n — p)-¥2 for allr =p +1,...,n 


(n,3a¥n-p) 
| 
| 
| 
{pt+l,ayn-p) | 
| | 
lai sac bk ESN caeOS SA 
pt 
| 
(p+\;aYn-p) 


n 
| 
| 
| 
| 
| 
| 
| 


Fig. 1.2.4. Region of acceptance 
(n-3avn-p) for the simple cumulative sum test 


G(r) 
ie ae Fig. 1.2.5. Region of acceptance for 
the quadratic cumulative sum test 


and 
UC Sera 
DUS 


<c forallr—p+1.,...,n, respectively, 


Q, — 


ie. if all W, lie between the straight lines running through the points (p +1, 


ayn — p) and (n, 3a Vn p) and (p +1, —a Vn — p) and (n, —3a Yn _ p) 
(cf. Figure 1.2.4) and if all Q, lie between the lines running parallel to the 


P in the distance c, respectively (cf. Figure 1.2.5). 


straight line g(r) = — 
Nn — 


The parameters a and c, respectively, have to be chosen in such a way that the 
desired level of significance « is obtained approximately. 


88 Chapter 1. Parameter estimation and testing hypotheses in nonlinear models 


For the following «-values the a-values were given as follows: 


el O01 0.05 0.10 
a| 1.143 0.948 0.850 


To determine c = c(a,n — p) an approximate method was proposed. There 
have not yet been any theoretical statements on the power of these cumulative 
sum tests. A simulation study by Garbade (1977) for the case p = 1 seems to 
point to the quadratic cumulative sum test being better than the simple one. 

Hackl (1980) proposes a modification of the two cumulative sum tests. 
Instead of W, and Q, he starts from moving sums of recursive residuals 
(MOSUMs) 


1 a 
Wie = PS Ut» = 1b cia k, >” 
me 6(n) pee 
and 
1 Tr 
0% ius P= phy ah 


— xO 
Or x t=r—kt+1 


Tk n 
with the variance estimator 6°, = (n — p— me ( SHE el vi). 
t=ptl t=r+1 
where k is a given natural number 0 < k < n — p. Here for each time r we 


sum only the last & recursive residuals v; The hypothesis of constancy is 
accepted iff 


\Wry| Sma) forall r=p-k, 2 
and 
W%) SQrnn Sul) forall r=p+h,....n, 


respectively, where the critical values m(«), q(x) and q,(a«) are chosen in such 
a way that the resulting test is of level x. The book of Hackl gives a fine review 
of various tests for testing constancy. He also introduces various modifications 
of this tests based on moving sums of recursive residuals. Using simulation 
experiments he makes a comparison of the power of various test in rejecting 
a false hypothesis. He concluded that this MOSUM test based on W,, is more 
powerful than the simple cumulative sum test based on W,. On the other hand, 
his investigations indicate that the power of the quadratic cumulative sum 
test (Q,) exceeds the power of his quadratic MOSUM test (Q,,). For more 
details we refer to the very interesting book of Hackl (1980). 

In connection with the cumulative sum tests let us still refer to a paper by 
MacNeill (1978), who investigated asymptotic distributions of cumulative 
sums in a model with two polynomials as state functions and equidistant ob- 
servations, which are based on the usual residuals y, + &'(t) h(2;). 


1.2. Switching regression models \ 89 


1.2.3 Models with continuous state switching 
1.2.3.1 The model 


Without restriction of generality we now assume that in model (1) 2, = a, € R}, 
for the experimental design & = (a, ..., 2), 


tp Sts Sst SH, 


and that additionally the conditions 


hilyis &3) = hisr(Yis Oi), +=1,....7—1, (18) 
hold. 
Then the state functions and the change point y € I? define the regression 
function 


Kx) = go(x) =Ai(w,a;) for yin SerSy, t=1,...,7. 


Additional to (18), continuity conditions to the first 1; derivatives of the re- 
gression function g(x) = g(x, 8) with respect to x at the points y;, 7 = 1,..., 
r — 1, are sometimes also demanded. This class of models includes for instance 
models with continuously connected polynomials as state functions. These 
models play an important role in many applications. The reason for this is the 
relatively simple form of the functions and their good properties of fitting, 
which are well known in spline theory (cf. Ahlberg, Nilson and Walsh, 1967). 
For this type of model the term ‘spline regression’ is used and many papers 
have dealt with it. Here we only refer to Poirier (1973), Wold (1974), Ertel 
and Fowlkes (1976), Buse and Lim (1977), Park (1978), Jupp (1978), Agarwal 
and Studden (1978) and Dathe and Miiller (1980). 

For each fixed y € I let W(y) denote the set of all « € 4 for which g(x) 
= YJa,(%) satisfies all continuity conditions. Then the parameter 3 = («’, y’)’ 
varies in the space 


OF := {V(y) X ty} | y € I}. 
Hence we have a model of the form 

y= 9(a,my)(%t) ar &, t= MR ere OF Ke = 0, De = 2 (m(y)) 
with « € V(y), 2(m(y)) € Vy v € I. 


The notion of identifiability of « given y € J corresponds to the identifiability 
of « given m = m/(y) in Section 1.2.2.1. This means that 


Feary Bt) = Jory), €=A1,..,0, 1,02 € Wy) 


implies x, = a». 


90 Chapter 1. Parameter estimation and testing hypotheses in nonlinear models 


In contrast to the model with abrupt changes of state, not only may the 
change index m(y) be identifiable with respect to a suitable experimental 
design, but, because of the couditions, also the change point y itself. 

Let us return to Example 1.2.4 with two straight lines as state functions 
and a continuous change of state, ie. a, + byy = az + bay holds. Under the 
assumption that (a, b,) + (do, bz), the parameter « = (a, b,, dy, bg)’ is iden- 
tifiable from the observations and y is uniquely determined by y = (a, — a2)/ 
(b, — 6,). Consequently it makes sense to estimate not only m(y), but y itself. 

Now we turn to models with linear state functions 


h(x, xi) = ohi(x:) , (a; € IR?*), eA ote 


The continuity conditions imposed on the regression function f(%) = g9(x) 
can be represented by 

Cra a! 

— a;h;(x) SaaS O54 Whiss(2) > (20) 

Gar! z=», On! m= 
Fa OD elit Sk, Seat fo de 
The equation (20) can equivalently be described with a matrix A(y) by A(y) « 
= 0, and V(y) = V (A(y)). The model has the representation 


y=Al(my))«o+e, He=0, De=2Z(mly)), Bi 
ae N\Aly)),  2mly))€ Vins vel. 


Given y € I the parameter « is identifiable and hence A(y)-conditionally 
unbiased linearly estimable iff 


Salle aie 
( A(y) = panes By 


(Concerning the notion of A(y)-conditional unbiasedness, see Bunke and Bunke, 
1986). 


1.2.3.2 Least squares estimators 


As in models without continuity conditions [Section 1.2.2) a WLSE $ can be 
computed as } = (&;, ")’ in two steps. For each fixed y, &, is a (restricted) 
weighted least squares estimator with respect to the parameter space V(y). 
) minimizes the residual sums of squares S(y) = n”|y — I(2,,7)|n defined by 4, 
with respect to y € I¥. 

For models with nonlinear state functions only iterative methods will be 
applicable for the computation of the WLSE &,. In the case of linear state 


functions &, is given by 


&, = OP(y) H'(m(y)) Wy 


1.2. Switching regression models 91 


where the matrix C'?(y) is defined by the relation 


( a Ae) is ( | ay 


and the matrix H (m(y)) Ci(y) H ‘(m(y)) W is a projection matrix into the space 
H(m(y)) N(Al(y)) with respect to the norm “|\z|| = (z/Wz)¥2 (ef. [A 1.8]). If 


H 
r Bey) = p is valid, 


&, = |H'(m(y)) WH(m(y)) + Ay) Ay) | H'(m(y)) Wy 
follows by [A 1.9]. 


Remark 1.2.2 Vf x, + y; holds for allt = 1,...,n andi = 1,...,r — 1, then 
the mapping of the observation x, on the state functions is uniquely determined. 
But, if there exists a point x, with x, = y;for ani € {1,..., 7 — 1}, then because 
of the continuity condition on the regression function at the point y; we can 
allocate the observation in the point a, = y; to the state function «{h;(x) as 
well as to the state function 0}, hi.,(a). Thus there are two design matrices 
iH (y) and j,,1(y). If & and &*7 denote the corresponding WLSE, we can show 
because of ,H(y) V(A(y)) = issH(y) W(A(y)) that & and 4+ provide the same 
residual sums of squares 


“ly — Hy) 8 |, = “ly — Hy) ih. 


Now we turn to the problems how to compute the WLSE #. If I’? = {y}, 
the WLSE is given by (&;,, 74)’. A model with a known change point yo is 
treated by Poirier (1973) and Buse and Lim (1977), who consider a model with 
continuously connected polynomials of third degree and moreover demand the 
continuity of the first and second derivatives with respect to x of the regression 
function. But a model with a known change point yp is an exception. Generally 
most of the computational effort arises as a result of the variation of y in I”. 

For a finite set I% = {y®, ..., y} we can compute ? and hence a WLSE é 
immediately, although with great computational effort if v is large. Often 
the parameter space is a set with infinitely many elements. A general algorithm 
to determine @ for an infinite J* is not known. 

Hudson (1966) investigated the determination of an OLSE & for a compact 
parameter space J* in models with continuously connected linear state func- 
tions without further continuity conditions on the derivatives. He obtained 
some important analytical properties, which allow a reduction of the com- 
putational effort for determining an OLSE &. For the special case of a model 
with two continuously connected straight lines we get a very simple practicable 
algorithm for determining an OLSE & (Hudson, 1966; Hinkley, 1969, 1971). 
But Hudson’s method does not lead to an applicable algorithm in every model. 


92 Chapter 1. Parameter estimation and testing hypotheses in nonlinear models 


In models where possibly also the derivatives of the state functions may be 
continuously connected in the state points, we have to resort under consi- 
deration of Hudson’s results in certain regions to an approximate determination 
of » which is based on a given finite net [1% <I of y-values. In general, an 
approximate WLSE #, will be used, where the continuous space J™ is ap- 
proximated by a discrete finite set / and where 8, is a solution of 


=min min*|y — Tas\e 


2 
n A 
yel® ae (y) 


“ly — IZ, 


Another possible way to determine a WLSE #@ consists in using a repara- 
metrization 8 = 7(#) based on the continuity conditions and the determina- 
tion of the WLSE f of f in the reparametrized model. To determine the WLSE 
B we will use iterative methods. Afterwards, } will be computed as the solution 
of 8 = 7(8). The method was used by Gallant and Fuller (1973) to compute an 
OLSE # in a model with r piecewise continuously connected polynomials with 
continuous first and second derivatives. Furthermore, they gave sufficient 
conditions for the convergence of the used modified Gauss-Newton iteration 
algorithm. Iterative techniques for determining an OLSE in models with 
continuously connected polynomials are also investigated by Jupp (1978).* 

As in the model with abrupt changes of state we can introduce minimal state 
lengths 6;, 7 = 1,...,7, and the corresponding parameter space 05 = {8 = (x, y) | 
lvi — vial S 6, 7= 1,...,7,0€ Vy), yer}. A WLSE # with respect 
to @§ has similar properties to a WLSE # (= 8°) with respect to 6°. 

Some asymptotic properties of 3° and }, especially some sufficient conditions 
for their consistency, will be investigated in Section 1.2.4. 


1.2.3.3 Some testing problems 


As in Section 1.2.2.3 we assume &, to be normally distributed with a constant 
variance o?. We confine our attention to models with r = 2 linear state func- 
tions «,h(x,), 1 = 1, 2, which only differ in the parameter. 


(a) Known change point 


We consider the model 


e ~ N(0, o7!,), aE NM(Ay)); o2 € Rt. 


* An approximate method for calculating least squares estimates of the model para- 
meters in models with changes was proposed by Bunke and Schulze (1984). This 
method is based on a differentiable approximation of the segmented regression func- 
tion and allows the use of the well-known iterative techniques for calculating least 
squares estimates in nonlinear regression. 


1.2. Switching regression models 93 


As in Section 1.2.2 we investigate the test problem 
LT ees against AR Ae: Ur 


e we H(m(y)) ; 
Under the testability condition r | | -~—-—- = p = 2p,, the linear hypo- 


thesis testing provides the test statistic 


_ fe |WPvy —PxylP? fe lly — Pxyl? — lly — Peyl? 
Ky) = = — SE OP 
a h ly—Psyl? ft lly — Pyll? Popes 


where Py and Px denote projection matrices into the spaces 


H 
f= H(my)) M(Aly)) and 3% = a ( ------~ ), 
f, and f, are defined by 
H,(m 
fy 2p, — [Aly] = ( ee ee ) and fz=n — 2p, + 7[A(y)]. 


(b) Known change index m = m(y) 


Now we suppose that the change point y is unknown, but that we have addi- 
tional information about y with the change indes m = m(y). We define 


Fin = fy € FF | m= m(y)} 
and obtain the model 
y= H(m)«+ 6, ¢ ~ N(0, o°,,) 
(22) 
a € W(Aly)), vy € Tiny, GPCR: 


Since y is unknown, Sprent (1961) proposes not to test the problem H: a, = a 
against K: «, + a» in the model (22), but in the larger model 
=H E, ¢ ~ N(0, oI,), 
y = H(m) a + ( ) (23) 
« € R?, a7 é IRT. 


Thus we do not use the continuity conditions and we can apply the test (11) 
from Section 1.2.2.3. 


We are also interested whether in the model (22) the change takes place in a 
given point yo € Iim- The corresponding test problem 
H:y=V%0 against Kase Vo 


is not testable in the model (22). Instead of the hypothesis H: y = yo in the 
model (22), Sprent (1961) considers the linear hypothesis H: «a € V (A(y0)) 


94 Chapter 1. Parameter estimation and testing hypotheses in nonlinear models 


in the larger model (23). Under the testability assumption (A (yo))’ € me((H (m))'), 
this procedure leads to the test statistic 


n — r[H(m)] lly —PHmr(aonyl? = ly = Pay? 
Gap Fp ayy) n—1r(H(m)]* 
r[A (yo) ] lly — Puumyll? Hf ' 


Furthermore, Sprent (1961) still inquires into corresponding test problems 
occurring when simultaneously observing several regression functions with 
changes of state. 


(c) No additional information about y 


In the case of an unknown change point y € J* the literature usually deals 
with tests in the model with two continuously connected straight lines and 
identically N(0, o?)-distributed observation errors. 

In this special model Hinkley (1969, 1971) investigates the likelihood ratio 
tests for testing 


Ho: Oy = Xo against Ke: Oy == Xo 
and 
Hy Vi =" 6 against Ky Ve ye: 


Under the assumption 7[H(m(y))] = 4 for all y ¢ I the problem Hy against 
Ky is testable. The problem H, against K, is testable only under further 
assumptions. Under the assumptions 


r[H(m(y))} =4 forallye Fé 
and 
N(A(y)) i MN(A(yo)) = {ax = (01, og)" | % = og} 


for all ye I* with py + yo 


H, against K, is testable in the model that is restricted by the condition 
&, + &». The use of the restricted model only means that the case of no change 
of state is excluded. But the restriction is of no importance in the computation 
of the likelihood ratio statistic since (I, ! — I) &, + 0 holds with probability 1 
for all y € J*. The likelihood ratio tests for testing Hy and H,, respectively, 
lead to the statistics 


1 


Ay = aa (S — S(d)) 
and 
1 
A, = - (S(70) — S(%)) 


respectively, where S(yo) = S(G,,, Yo), S(¥) = S(@, 7) = S(&) and S denote 
the residual sums of the OLSE 4@,,, @; and the OLSE & in the model without 


1.2. Switching regression models 95 


changes of state. Hinkley (1969) pointed out that A, is asymptotically ?- 
distributed under H, and he conjectured that A, is asymptotically 3-distri- 
buted under Ho. He proposed replacing the unknown variance o? in Ay and A, 
by a consistent 7?-distributed estimator, and to approximate the distributions 
of the obtained test statistics by corresponding F’-distributions. 

Another asymptotic test for testing Ho using a partial Bayes approach was 
given by Farley and Hinich (1970). Restrictively they supposed that the 
change of state takes place in exactly one of the points 2, ...,2, and that 
each point may be the change point with the same probability 1/n. 

A general test problem in a more general model was considered by Feder 
(1975b). In the model with r continuously connected linear state functions 
and independently and identically but arbitrarily distributed errors ¢, he 
considered tests to check 


H: 3 € Og against K:8€@,:=@0 62, 


where Oy is a subspace of O with certain properties. As test statistic he used 


A n/ 
Ags ae ‘3 
S(O) 
where S(dz) and S(,) denote the residual sums of an OLSE of # under H 
and K, respectively. He gave sufficient conditions for the asymptotic ?- 
distribution of —2 In 4. For a model with two intersecting straight lines with 


idependently and identically distributed observation errors, it follows in 
particular that the corresponding statistics —2 In 4, and —2 In A, for testing 


Hy: 7 = Yo against = Ky: y + 7 
and 
Hy: & = (01, %2, )’ = (10, 20, Yo)’ = Fo against Ky: 3 + I 


asymptotically have a yj- or yj-distribution under the corresponding hypo- 
theses. Asymptotically the commonly used statistics 


—2 In A, = n (In S(y9) — In S(¥)) 
and 
—2 In A, = n (In S(p) — In (4) 


have the same distributions. 

By an example Feder demonstrated that the asymptotic distribution of 
—21n4 may deviate from a central 7?-distribution if # is not identifiable. In 
the model with two intersecting straight lines the change point y and con- 
sequently the parameter # are nct identifiable under Hy: a, = 2, 80 that the 


asymptotic distribution of —2In 4 = n(InS — In S(9)) under Hy is not 
known up to now. 


96 Chapter 1. Parameter estimation and testing hypotheses in nonlinear models 


1.2.4 Some asymptotic results on least squares estimators 
in ordered models with state switching 


In this section we want to derive sufficient conditions for the consistency of 
the weighted least squares estimators in the models with abrupt as well as with 
continuous changes of state. The results and the technique of proof are closely 
related to the results in Section 1.1.5 on the consistency of a WLSE in general 
nonlinear models. We will show that some essential assumptions from Section 
1.1.5 are not fulfilled for the models with changes of state considered here, so 
that the following statements about consistency can not be immediately deri- 
ved from Theorem 1.1.1. 

Analogously to Section 1.1.5 we will split the state parameters «;,7 = 1,..., 7, 
into their linear and nonlinear parts, where we will denote the linear parts by 
«x; and the nonlinear ones by 7; (a; > (aj), ni))s i.e. for the state function we 
have hj(x, «;, n;) = «;h(x, n;), 7 = 1,...,7r. In this connection the following 
notation is introduced: 


a; € IR?, ni €¢ H;—R™, Ae 1 ee 

Pe shes acy Ode Ge 7 = (mM «+5 Mp)’ € H = X H;, 
6 =(i. 7 EXT =:8, z 
Ooo 75 Pee Be Gk RY SERS TS 

6, := {PE OlyEeT;}, Of == 9 CO Lye Tt, 
ere (FeO ye I}, 


where the index » indicates here and in the following the dependence on the 
sample size n. Then the function g9(z, z) has the representation 


9o(t, z) = a'ha(x,z) with 
h(x, z) = h(x, z, n, y) 
——— (hy(2, 1) 1 oe (z), sey hi (a, Nr) Tiy,_.0\(2))' : 


As in Section 1.1.5, we allow the inadequacy of the model, i.e. the unknown 
regression function /(x, z) isto be approximated by a piecewise function gg(2,z). 
The WLSE #*% is defined to be a solution of 


Q,(8°) = min Q,(3) 
deokn 
with 
n 
Q,(8) = “ly — gol, = 2d wt"(ye — go(X, 2)? 
— 


i 


where ,, = 8° holds (i.e. 6 = 0,) (cf. Section 1.2.2.2). 


1.2. Switching regression models 97 


In contrast to the assumptions of Theorem 1.1.1 in Section 1.1.5: 


e The parameter space O§* is not constant, but varies with the experimental 
design £,; 05" — 0; — 0. 
e Ifr > 2, then the parameter space 


T= {y= (Vis +--+ Ya) [CS SUES Pe iy PH 


is not compact and thus the space #@ = H xX I’ is not compact; it holds that 
Bee ie, and 4;°—=— HX Tyo 2 Cyc HX Ly, and Ty (6 = 0) 
and I) are compact. 

e The central assumption of the continuity of h(x, z) in B for a fixed x and z 
(Ag) is violated in general, as the following example will show. 


We consider a model with two linear state functions and z = x. Then 
ga(x) = a’h,(x) holds with 


h,(x) Lia,y\(%) : 


as ( Teale) Ly (2) 


If h,(x) = 0, 7 = 1,2, then h,(x) for any fixed x is discontinuous in y at the 
point y = z. 

Although essential assumptions from Theorem 1.1.1 in section 1.1.5 are 
thus violated, the consistency of the WLSE can be proved under similar con- 
ditions and with a similar technique as in Theorem 1.1.1. First we formulate 
the following assumptions: 


1. Let a function «: 4 > R! exist that has at most a finite number of points 
of discontinuity with 0 < x < u(z) Se for allz¢€ & and 
S, = max |w” — u(ztn)| PELE), 
1Stsn 


2. e, are independent random variables with He, = 0, 


r 
2 
He; = 0 = Dy OL yy, ya) 
s— 1 


(a) & are identically distributed with of = o? or 


(b) the 4th moments satisfy the condition }) "He! < oo. 
t=1 


3. The sequences of empirical distribution functions 
n 


F(x, 2) BA a va | ares (cot el eine Ztn) 
t=1 


yen 2) Le hen) 


t=1 


7 Nonlinear Regression 


98 Chapter 1. Parameter estimation and testing hypotheses in nonlinear models 


defined by the sequences of experimental designs Aen a ((%1n5 a Baas 
(25. Say) )t converge for all (a, z) € X X & to distribution functions F(a, z) 
and F(z) over  X & or &, and F(z) is continuous and strongly increasing 
on [c, d]. 

4. The regression function f(x, z) is piecewise continuous and bounded, i.e. 
there exists a finite number of points 7% :=a<1t<...<1,:=5), so 
that f(x, z) is continuous and bounded on & X (7;_;, 7] for alli = 1,..., ». 

5. Let go(v, z) = a’hg(x, z) hold with 


hg(z, z) = h(x, 2, n, y) 
= (hy(x, 1) Lia,y,3(2)» hy(x, 2) Lys yal(2)s teey h(x, Nr) Tiy,-1,0\(2))’ > 


and let all components h;;(7, 7;), 7 = 1,..., pi of hi(x, ni) be continuous on 
the compact sets 2 x H;,1 = 1,...,7. For all 6 = (n, y) € H X TI, let 


LRN ( fue he n(X, 2) hayy(x, 2) dF (a, Dee 
] 


LER i= Tene 


be a non singular matrix. 
5’. Let « = z and g(x) = a’h,(x) hold with 


hg(x) = h(x, n, y) = (hi(x, m1) atia), teey hy(x, Nr) Ty,-4,b\(2))' 


and all components h;;(x, ;) of hi(x, n;) be continuous on the compact sets 
[a,b] X H;, += 1,...,r. The continuity conditions on the function g9(x) 
can be described by a matrix A(8) by A(8) « = 0 with the property that 
the matrix A(f) is continuous in Bf € H < I’ and the matrix 


%(hg, hg) + A’(B) A(B) 


$—1) cD 
=((f J alee) hacay(a) rary() ue) +A") AB) 


Jal ,0..01?) 


is non singular for all B = (n,vy)€¢ HX I. 
6. (a) Let a unique solution 0% of 


min “|f — gol? = “lf — go5/? 
BE OS 


exist. 
(b) Let a unique solution # of 


min“|f — go|?’ = *|f — gos|? 
BEO 


exist. 
7. There is a J € 6 with f = gp. 
8. For all 3, 8 € 6, “Igy — g5|? = 0 iff 0 = B. 


@ 
1.2. Switching regression models 99 


Here (J, k), and %(, k), are defined as in Section 1.1.5, and “(1, k) := f u(z) 
4 LxXd 
x U(x, z) k(x, z) dF (x, z). Generally we use the notation introduced in Section 


1.1.5. The assumption 5’ is related to the model with continuous changes of 
state and is an analogue of the assumption 5, which concerns models with 
abrupt changes of state. 

The assumption 1 allows weights w that may depend on the observations 
and the measuring points z,. The assumption is trivially fulfilled for the 
OILSE and the GILSE. 

From 2(b) it follows in particular that 

min of < of = maxo? < oo forall ¢=1,2,... 
1S<isr 1s<i<r 


and under the assumptions 1-3 with the theorem of Helly-Bray it follows that 


n r mr(v?) 
m1) ule) of = > cif 3 we) 


t=mn(y?_,)+1 
a Dg a? f u(z) dF n (z)= Faape xe ci ee =: Ty(y°) 


so that the assumption A, (a) or (b) from Theorem 1.1.1 is satisfied, where 
we give up the modified Lindeberg condition, which is not necessary for the 
consistency. 

For all &= («’,x’,y’)/ €0 let 6(#) =y,;—yin, t= 1,...,7 and 4d(8) 
= (61(9), eo 6,(8))’ be the state lengths given by y. For 6; = (6), ..., dir)’ ds 
= (691, ---, Oo)’ let 6, S dy or 6, < 62 be defined componentwise by 6,; S 63;, 
Pires hie Se STON! O14 Ons = Agaety T 

The next theorem provides the consistency for the WILSE 5° in the case of 
abrupt change of state. 


Theorem 1.2.1 
1. Under assumptions 1, 2 (a) or (b), 3-5, we have 


(a) lim “lf — Yya|, = lim “lf — 93| = A? as. 
n—>0o = n—>00 ; 
with A? := min “|f — gol, 
BEO5 
(b) af there exists a OF € 6 with 
“lf — goe| = min “|f — gol =: 4 
0€0 


at follows that 
= Ay a.s. 


lim “lf = 935 
n—>0o 


— lim “lf rao, 745 
for all 6 with 0 < 6 < 6; := 6(8;) (consistency of the WILSA). 


[* 


100 Chapter 1. Parameter estimation and testing hypotheses in nonlinear models 


2. Under 1, 2 (a) or (b), 3-5, we have 


(a) under 6 (a), oe + Hw, 
(b) under 6 (b), 3° + 9 for all 5 with 0 <6 S 6, := 5(9) (consistency 
of the WILSE). 
3. Under 1, 2 (a) or (b), 3-5 and 7, we have 


Ai = Ay = 0 and 


yi 


Qn (9%) > ty (y = de tf ule) a 


Ye 


for all 6 with 0 < 6 S do := 4(9) 
(consistency of the variance estimate). 
4. Under 1, 2 (a) or (b), 3-5, 7 and 8, we have 


5° ==+ 9 for all 6 with 0 < 6 S< dy := 5(9) 


(consistency of the WLSE). 
Proof. The proof essentially follows the ideas of the proof of Theorem 1.1.1. 


Therefore we can restrict ourselves here to a short sketch of the differences. 
Let the sets #) and # be defined by 
Be = HX ly Vande He t= {fhe t 1. Ds Be aie 


Since 


(1, by = mS (ey) Urs 1) Klee, 22) = f wlz) Ux, 2) bw, 2) AP lz, 2) 
t=1 LEZ 


we can show by means of an appropriate splitting of the integrals, the theorem 
by Polya, and by using the compactness of #) that 


sup ["(Z, k)n *: “¢; k)| Br 0 
Lke# 


which according to Lemma 1.1.1 yields 
sup |"(1, k), — “(I k)| + 0. (24) 
lke 

Now (24) implies 
sup || (aca) —h5aln — “hey — havin +0, t= 1,...,p, 
BEB : 

and we can show further that 


“Iheciy — hg] >0 for B>B(EB), i=1,...,p. (26) 


1.2. Switching regression models 101 


The statement of Lemma 1.1.2, 


sup “(l, ) —> 0 (27) 
le KH 


remains true, but because of the missing continuity property of he(a,z) (Ag) 
it must be proved in another way. As in the proof of Lemma 1.1.2 we show: 


“(1,),——>0 forall le # (28) 
and 

“ela tA) (29) 
Exploiting 


"(hays €)nl S “\haay — Agy|n “lEln + “hg iys €)n 
S|" Agi) —hzala — “hay — hgy|| mleln 
+B “\heiy — hz| “lelat “(h B(i)> E)n > 


(25), (26), (28), (29) and the compactness of 4p, it follows similarly as in the 
proof of Lemma 1.1.2 that 


sup “(hgiy, €) “+0 forany i=1,...,p, 

BEB 
from which we can derive the property (27) with the same technique as in the 
proof of Lemma 1.1.2. 

Further, we can show that “(h;,h;) and “(hg, f) are continuous in B € Bp 
= H X I. This immediately yields that og := (hg, hg) + “(hg, f) is continuous 
inB€ @=H x I. This statement can be shown only for #, but not for A 
since “(hg hg) 1 for B = (n, y) with y € Ig \ I does not exist. But we may , 
introduce ag = “(hg,hg)* “(hg,f) and & = (hg, he)n “(hp, fn. Thereby & = &(, my) 
holds. If y;_, = y; is valid, «;(8) = &;(8) = 0 follows for the ith subvectors of 
og and &,, respectively, so that 


sup ||@s — «|| = sup ||, — all ——+ 0 
BEB» pe B 
can be shown with the same technique as in Lemma 1.1.3. The continuity of 
op yoplies the continuity of d(B) := "|f — aghs| on # so that there exists a 
Bi = (nf, vi) € ®s = H XI with d(6s) = min d(6). Assumption 3 implies 
BEBS 
Filybs) — Falvise) > Fly's) — Fh) > 0 


and thus it follows that yfe I» and Q,(85) < Q,(01) for almost all ». With 
sup ||«x,|| < oo and the same arguments as in the proof of Theorem 1.1.1, the 


BEBs 
statements 1 (a) and 2 (a) follow. If there exists a # € 6 with 6, := 6(9,) > 0 


102 Chapter 1. Parameter estimation and testing hypotheses in nonlinear models 


and "|f — gos| = ae “If — gel, it immediately follows that Ae := min “|f — gol 
0E05 


= A, for alld < 6, aed consequently the statements 1 (a) and 2 (b) imply the 
statements 1 (b) and 2 (b). The remaining statements are consequences of 1 
and 2 


If the assumption 5 is replaced by the assumption 5’, then the continuity of 

== ["(hp, hg) + A’(B) A(B)P! “hz, f) in B € # follows from the continuity of 

“ll, he) and “(hg, f). With the same technique of proof as in Theorem 1.2.1, the 

following theorem of consistency of the WILSE a in the case of continuous 
changes of state can be shown. 


Theorem 1.2.2 Under the assumptions 1 to 4, 5', 6 to 8 and x = 2, the statements 
from Theorem 1.2.1 hold. 


Remark 1.2.3 Since the continuity of «g, could only be shown for 6 € # 
=HxTy, but not for B € ® := H X I, we do not succeed in showing 


sup ||«g|| < co. Thus we can not prove the consistency of 5, = 5° with the 
BEB 
applied technique. But for the special case r = 2 we have J’= I) = I; with 


6 = (c —a,b —d)’ and thus the consistency statements of Theorems 1.2.1 
and 1.2.2 are valid for 3, = 3° = 9, too. 

In case the matrix “(hg hg) + A'(B) A(B) is not singular even for all B € By 
and if the identifiability condition 8 holds not only on 0, but also on 9, the 
consistency statements are also valid for 5b, = 5° in the model with continuous 
changes of state and an arbitrary state number r. 


Remark 1.2.4 Provided that the linear parameters «; vary in compact sets 
A;, ti =1,...,r, we can similarly show that 5° and >, = $° follow the con- 
sistency statements of the Theorems 1.2.1 and 1.2.2. 

For models with 7 continuously connected linear state functions, Feder 
(1975a) investigated sufficient conditions for the weak consistency of the 
OLSE &:,. He did not use the minimal state lengths 6;, but established a con- 
dition on the sequence of experimental designs {é,} which ensures that, with a 
probability converging to 1, the OLSE ©, lies in a compact set containing the 
‘true’ parameter % (assumption (*) and lemma 3.4 in Feder, 1975a). Moreover, 
he investigated the speed of convergence of the OLSE and derived sufficient 
conditions for the asymptotic normality of 4, (1975a, theorems 4.13 and 4.17). 
From his results it follows that in the model with two intersecting straight 
lines and a homogeneous variance n¥?(¥ — y°) is asymptotically normally 
distributed. According to Hinkley (1969), empirical studies showed that the 
normal distribution for finite sample size n provides a bad approximation for 
the distribution of 7,. Therefore he suggested alternative approximations. 


1.2. Switching regression models 103 


1.2.5 Some other models with state switching 


In the short description of other models considered in the literature we can 
restrict ourselves to models with r = 2 linear state functions «{h,(x) and 
x3h.(x) with normally distributed observation errors. Goldfeld and Quandt 
(1972) treat a model with z, = k’s, where k is an unknown vector and s; an 
observable or nonobservable variable. In particular, s, = x, may hold. Since 
z, = k’s, is not known, we can not turn to an ordered model because we do not 
know which observations belong to which state. These assumptions lead to a 
model where more unknown parameters than observations occur, so that the 
parameters are not identifiable. That is why Goldfeld and Quandt (1972) suggest 
an approximate maximum likelihood estimate (D-method) of the state para- 
meters 0, %2, 01, 0, and of the parameters k and y describing the mechanism of 
assignment between states and observations. The method is based on a re- 
duction of the number of the unknown parameters. 

Besides models with a deterministic character of assigning the states and 
observations, Quandt (1972), Goldfeld and Quandt (1973) and Quandt and 
Ramsey (1978) consider models with a stochastic mechafiism of assignment. 
In the observation model with two states they introduce the state probabilities 
Ay and A, = 1 —Ay, t = 0,...,, where /,, denotes the probability that the 
system under consideration is in the first state at the time ¢ (i.e. y, = «h,(2;) 
+ ¢,, De, = o? holds), and /,, denotes the corresponding probability for the 
second state. 

Two special cases are investigated: 


(aV4 == A ator all ¢=- 0; 1... 


(b) A, = (=) = ( "4 ie a) a A Pe a 6) a een Py 


Ate Pi T2 At_i2 


1 3 oy may be interpreted as a Markovian tran- 
— Te T2 

sition matrix, where 1 — 1, denotes the probability that the model changes 
from state 1 at the time ¢ — 1 to state 2 at the time ¢ and 1 — rt, analogously 
denotes the transitionprobability from the state 2 to the state 1. In order to 
estimate the state parameters «,, 2, 07, 63 and the probabilities 2 and Ay, 1% 
and 2, respectively, characterizing the state mechanism, Goldfeld and Quandt 
(1972, 1973) propose maximum likelihood estimators. For the special case (a) 
Quandt and Ramsey (1978) derive a further estimate based on the moment 
generating function. Under certain regularity assumptions this estimator is 
consistent and asymptotically normally distributed. Bayesian methods for 
estimating the state parameters and the change index m in a model with two 
straight lines as state functions when using an improper a priori distribution 
were treated by Ferreira (1975) and Holbert and Broemeling (1977). An appro- 


The matrix 7 = ( 


104 Chapter 1. Parameter estimation and testing hypotheses in nonlinear models 


ximate Bayesian method to estimate the change point in the model with two 
continuously connected straight lines was introduced by Bacon and Watts 
(1971). 


1.2.6 Methods of identification of state switching in models 
with unknown number of states 


Now we consider the model (1) with linear state functions «;h(x), where the 
changes of state take place in dependence on « (i.e. z= k(x)) and where the 
number of states 7 is unknown. The same considerations as in Section 1.2.2 
lead to the observation model 


Yt = ajh(a,) + & De, = oj, i GcAg, (a 2 Uren ar 
with 
A, = (m +1, m4 + 2,..., my}, eas seg er 


unknown change indices 


Fr ee ae +=1,...,7—1 
1Stsn 
and an unknown number of states r from a given set 2. 
Finding an estimate r by the least squares method as a solution of 


n 
S(?) = min S(r) with S(r) := S(#(r)) = min DY (y: — go(a))? 
reR eco§ t=1 
demands a considerable computational effort. Here 6; denotes the parameter 
space chosen for the experimental design é and r states. For example, in a 
model with n = 20 observations, p, = 2-dimensional state parameters «; 


and the assumption that there are at least three observations within each 
n 


state, as many as 406 residual sums S(#) = &> (ye — 9'5(4))? have to be com- 
t=1 

pared. In the case of n = 25 observations, the number of residual sums to be 

computed increases to 2745. 

Obviously S(r) is a decreasing function in 7, so that independent of the true 
number of states an OLSE leads to a large value ?. 

McGee and Carleton (1970) and Schulze (1977b) suggested other empirical 
methods to estimate the number of states r, the change indices, and the state 
parameters §;, oj. These methods are mainly based on the use of the test (11) 
developed in Section 1.2.2.3 for the comparison of the states in two groups of 
observations. This test is applied to decide whether a given observation group 
and its adjacent group can be combined or not. Of course the methods depend 
on the chosen level of significance « of the tests used. The greater « is chosen, 
the sooner the hypothesis about the equality of the states in the considered 


1.3. Some topics in nonparametric regression 105 


observation groups is rejected and the more states are found. Theoretical 
statements on the power of such empirical methods do not exist and are also 
difficult to obtain because of the complicated structure of the methods. A 
comparison of the various suggested methods based on empirical investigations 
is still missing. 

Another method to estimate r based on Mallows’ C,-statistics was suggested 
by Ertel and Fowlkes (1976). Halpern (1973) considered a model with an un- 
known number of states r and continuously connected linear state functions. He 
describes a Bayesian method to identify changes of state. The most restric- 
tive assumption of his model is that the changes of state may only occur in 
finitely many given points 9,, ..., p,, where it is not known in how many and 
which of these points. Starting from prior distributions in an appropriate 
reparametrized model, he derives conditional posterior distributions for the 
state parameters given the change point y and the posterior distribution of the 
change points. Besides the estimation of the change point y by maximization 
of the posterior distribution, he investigates the problem of optimal prediction. 


1.3 Some topics in nonparametric regression 


1.3.1 Introduction 


In this section we review some aspects and recent results in the theory of non- 
parametric estimation of regression functions. Consider observations 


y, = {(2;) +8, 7 ie orey Ny (1) 
where ¢, ..., €, are independent identically distributed random variables with 
Ke, = 0, = 1 ey Oe 


The regression function f is measured in the design points x;, which belong to 
some interval & of the real line. In the preceding sections it was mostly assumed 
that the regression function f was in a class {fy | ® € 9}, where @ was a subset 
of some finite-dimensional space. Even though the actual model was allowed 
to deviate from this class, estimation methods were parametric, i.e., amounted to 
estimating a parameter of fixed finite dimension. If the parameter space is 
assumed to be infinite dimensional, then the model is said to be nonparametric, 
abbreviated NP hereafter. 

The theory of NP regression is intimately connected with other models of 
curve estimation, notably with the estimation of probability densities. We will 
neither explore all these interconnections nor attempt to survey the field of NP 
regression as a whole. We will focus on the decision-theoretic aspects of esti- 
mation. In particular, we will be concerned with asymptotic optimality of NP 
regression estimators. For small-sample results on minimax and Bayes esti- 


106 Chapter 1. Parameter estimation and testing hypotheses in nonlinear models 


mators of the regression functions we refer to the papers of Bunke (1985) and 
Van der Linde (1985). 

Let us now introduce some further model assumptions and notation. We 
will regard f itself as the parameter to be estimated, and suppose 


fed, 


where F is some subset of a function space. The set % is supposed to be a 
compact interval in R; let & = [0, 1] hereafter. 

Let y = (Y,, -7-, Yn)’ be the data vector. The set § = {x,,..., x,} is referred 
to as the regression design, and it is assumed that 7, Sa, <...< a,. Let 
f° = (f(«;));=1,...... In general we will be concerned with the case where é is 
nonrandom, given, and becomes dense in [0, 1] as n — ov, in a sense to be 
specified later. Some results to be reviewed concern experimental planning 
in the present context, where é can be selected prior to estimation. 

In the asymptotic framework we will admit design sequences such that 
{a, ...) Ly} is not a subset of {x,,..., 2}. A specific role will be played by the 
sequence of uniform designs {(7 — 1)/(n — 1),7 = 1,..., m}. For all notation 
we adopt the convention that the dependence subscript » can be dropped. 

A sizeable part of the literature on NP regression deals with the case where 
(x;, y;) are independent random pairs distributed like (X, Y), and where f(x) 
= H(Y/X = x). We remark only that with regard to asymptotic optimality 
of estimators, results mostly parallel those for nonrandom &. 

The particularities caused by the infinite dimensionality of *, which we 
are primarily interested in, persist if the error variables ¢; follow a simple 
statistical model. Thus we will always assume ¢; ~ N(0, 1), 7 = 1,..., n. 

With regard to the choice of the loss function, a distinction can be made 
between estimating f at a point and estimating f globally. If one is interested 
in the value of f at a given point x, the loss will be, e.g. 


F(x) — f(a)|? 


for an estimator /. For global estimation one considers a norm ||-|| of some func- 
tional space and a loss which is a function of ||f — fl]. The norm \|-|| will be 
selected among the norms of L, type, 1S p S oo. Let 


1 1/p 
inl = (f uted)” 1Sp<oo, - |Mflloo = essup (f(z)| 
0 


z€(0,1] 


(where essup f(x) = int | sup f(a) | w,(J2) = 0h). 


aeA z¢€A\M 


L, is the associated Banach space of (equivalence classes of) real functions on 
[0, 1]. 


/ 


1.3. Some topics in nonparametric regression 107 


As an estimator of f for sample size n we will admit any measurable function 
f:[0, 1] x R® +R. The risk will be defined (temporarily) as 


Raf f) = Eyal — fll 


or, alternatively, by substituting | f(x) — f(x)| for ||f — {lp above. Consider the 
supremal risk of an estimator 


Onl ts F) = sup Rh f) 
SEF 


and the minimax risk at stage n, 


A,(F) = inf o,(f, F). (2) 


f 


A sequence {/,,} such that Cull ns F) will be close to 4,(F) in some sense, for 
n —> oo, is called an asymptotically minimax (AM) estimator. The concept of 
AM optimality has proved its usefulness in parametric and NP statistical 
models as well as in robustness theory. One of its principal merits is that it 
allows the treatment of optimality in the class of all estimators, dispensing 
with restrictions such as asymptotic normality. For parametric statistical 
models, the classical local AM theorem of Hajek (1973) gave rise to an extensive 
development. Such local AM optimality statements can also be obtained for 
parametric nonlinear regression; in fact, this topic is covered in part by general 
results of Ibragimov and Khasminski (1981). 

A result of this type, in the parametric case, would be of the following form. 
Suppose F = {fy | # € O}, where @ is an open subset of IR*. We assume that O 
is compact, and that #;,7 = 1, ..., k are the Fourier coefficients of f with respect 
to some orthonormal basis {@;}jeqy of LZ. Suppose that é is the uniform design 
of size n, and that }:, isthe MLE of 8. Under standard regularity conditions one 
has 


£(n¥2(H,, — 9) | 0) > NO, Vo), 0€ O (3) 


where V is the asymptotic covariance matrix. Typically V» is of the form 


1 
v5 = [ ele) (Ate a. 
a8 oo 
0 

Since #; are the Fourier coefficients, 

Ofo() : 

—— = 9,(z), So Phyl OF Ve =, 0 eo. 

ay P;( ) ) 0 
Under some additional regularity conditions, (3) implies 


nEs ||P, — |? > tr Vo = ke. (4) 


108 Chapter 1. Parameter estimation and testing hypotheses in nonlinear models 


Also, since {g;} is an orthonormal basis, for in = fs, 
lifn — fle = lb, — OP. (5) 


Since © is compact, it is also a typical result that the convergence (4) holds 
uniformly over # € O. Then (4) and (5) imply (for p = 2) 


Onlin» F) > k. 


The corresponding version of Hajek’s AM theorem, stating asymptotic effi- 
ciency of f,, would be 


liminf n/,(F) =k. 
The latter two relations describe the solution of the asymptotic efficiency 
problem in this simple parametric case: 


lim n4,(F) =k (6) 
n 
This equality, which can be written 4,(F) = n*k(1 -- o(1)), n —> co, specifies 
n-1 as the rate of convergence of the minimax risk to zero. But it contains more, 
viz. also the constant k describing the exact asymptotics. 

Consider now a nonparametric model. An infinite-dimensional set * which 
is sufficiently rich will, for each k, contain a set F;, isomorphic to a compact 
subset of IR*. It follows that 
Consequently, n~1 is no longer the rate of convergence of the minimax risk. 
Note that the relation (7), which was derived for squared L,-risk (p = 2), holds 
also for 1 < p S w and for the risk at a point. Now, to solve the asymptotic 
efficiency problem in the nonparametric case, the behaviour of 4,(F) has to be 
specified. A first question is whether 4,(*)— 0. For smoothness classes * 
such as 


W(1, 2, L) = {f|f absol. continuous, lif + lif < 2} 


this is the case, as will be clarified later. To see which classes of ¥ can 
basically qualify, we refer to results of [bragimov and Khasminski (1980a). 
These authors studied a continuous time analogue of NP regression (see (25) 
below), and found that for 4,(7)—0 (in the case p = 2) it is necessary 
that F has some compactness property in the space L,. Classes like W(1, 2, L) 
are compact in this sense for L < oo, while the class of all differentiable f 
and also the unit ball in Z, are not. In fact, these results concern: the exi- 
stence of uniformly consistent estimators of /: 


AE Pr allifs —fa > OQ] el), noo, VE>O0. 


1.3. Some topics in nonparametric regression 109 


In parametric theory, the role of compactness conditions with uniform con- 
sistency results is well known. 

Once 4,(F) = o(1) has been clarified, interest centres on the rate of con- 
vergence to zero. For describing rates of convergence, we introduce the follow- 
ing notation. Two sequences {a,}, {b,} are weakly equivalent, or a, ~ by, if 


Cr aa,/0, <-C, for nC, 


for some positive constants C;. Also b, is called a rate of convergence of ap. 
We note that there are results in NP regression where n-! is the rate of con- 
vergence of the risk. This is the case, for instance, if the loss is defined in terms 


Zz 
of the function F(x) = f f(t) dé, as e.g. || — FI). A candidate for an optimal 
0 


estimator is the stochastic process 


VQ ys sakes (oa bh 
jsnz : 

since n1/? (Y(~) — F(z)) has a limiting distribution (at least if € is uniform). 
The problem is very similar to that of estimating a distribution function (see 
e.g. Millar, 1979), and the theory of limits of experiments applies in analogy 
to the parametric case. Millar (1982) treated the NP regression model for a 
related loss function, where also n/2-consistency applies. Earlier related 
references are Makowski (1974) and Beran (1982). Observe that the norms 
||| are stronger than ||F(-)||,. Though the operation /— F is continuous 
in L,, inverting it would be incorrect in a statistical context since the inverse 
is not continuous. This sheds some more light on the slower rate of conver- 
gence for the norm ||-||,. 

The latter consideration also reveals a relationship to the theory of ‘ill- 
posed’ or ‘incorrect’ problems in analysis. We mention the work of Fedotov 
(1981, 1982) exploring this aspect of statistical curve estimation. 


1.3.2 Optimal rates of convergence 


At this point it is appropriate to mention the connection of the present topic 
with the theory of the approximation of functions. In this branch of mathe- 
matics, many methods have been developed for the task of recovering a function 
f from data f(x;),7 = 1, ..., n, or from noisy data. Among these are linear inter- 
polation methods, which basically assume non-noisy data. In the NP regression 
model, linear interpolation algorithms are of interest in the case of replicated 
observations, where the error in each point of observation can be made small. 
If the design is not a replicated one, linear smoothing methods are appropriate. 
Consider a family of linear operators 


{Sus € IN, 7 > 0} 


110 Chapter 1. Parameter estimation and testing hypotheses in nonlinear models 


where each S,,, assigns a function on [0, 1] to a data vector y € IR”. Here r is 
a real parameter which describes the degree of smoothing, such that large r 
correspond to little smoothing. For instance, S,,, could be a kernel estimator 
with bandwidth 7-1, or a truncated Fourier series of length r fitted to the data 
y. More precisely, if {y;};<qy is an orthonormal system in Le, f; = (9;, /), one can 
form empirical Fourier coefficients 


and set 


Sry) = LHW) 7. 


j=l 


The risk evaluation for such estimators is analogous to the one for orthogonal 
series estimators of densities. Uusually, one knows from approximation 
theory that, for f € F, the operator S,,,({*) reproduces f with a certain accuracy: 


lf — Sarl = Or"),  n,r->0co, B>0, 
uniformly over # if r = o(n). Obviously, 

E.Sun r(Y) (©) = Surf) (@), — w € [0, 1]. 
On the other hand, we have 


Var 7(y) = WY ole). 


n 
Under some conditions on the basis and &, n=! © 7} (xi) will be close to one; then 


= t=1 
Var f(y) = n-(1 + o(1)). 
Now, for the risk of the estimator f = S,,(y) one obtains 
Elf — fle = Eqllf — Snolf 3 + BlSaely — f)IB 


< Olr-*) + ¥ Var F,(y) = O(r-%) + m1 + 0(1)). 


je 


We see that the last expression decreases with maximal speed if r is chosen such 
that r-?? w rn-}, i.e. if rw nV @F+), Then 


Ext — file = O(n-2/0+) 
uniformly over f € F, so that 
On? F)= O(n-2812B+1) | ; (8) 


All rate of convergence results for linear estimators in NP regression are basi- 
cally obtained in this way, i.e., by separate evaluation of the bias and variance 


1.3. Some topics in nonparametric regression 111 


parts of the risk and a tradeoff between them. This applies also to loss functions 
\f — ff, 1S p S c0, ¢g = 1. In the most general case one considers a risk 


Ell — fllp) 


where J: IR. — R, is a monotone function, and 6, > co is a norming sequence. 
The complement to the result (8) would be that the rate n~?? +1) ig the 
best possible: 


An(F) = n-?Fl (26+) | (9) 


Estimates ot the minimax risk 4,(F) from below are usually derived from the 
Bayes risk for an appropriate sequence of prior distributions. Many variants 
using discrete or continuous priors have been applied in the literature; we cite 
the result of [bragimov and Khasminski (abbreviated [Kh hereafter) (1980b, 
1982a), which concerns the global L,-risk. 

Suppose the bound (9) is proven; it is valid for a particular sequence of 
designs {£}. The question arises whether the optimal rate depends on the par- 
ticular design sequence, and whether the rate (9) can possibly be improved by 
experimental planning. [Kh answered this in the negative (for global loss) 
by proving a result like (9) for a whole class of designs, including randomized 
and sequential ones. To define this class, let (A, 4, P) be some basic probability 
space, and let A, <A be the o-algebra generated by (a, y,), .--, (@n» Yn)- 
If (x1, Y;),---, (4j4,Y;1) are given, the next design point a; is chosen by 
randomization according to a law 


L (x; | Aja)- (10) 


Since that choice is to be made by the statistician, it is natural to require that 
£; does not depend on the unknown f. Any array 


NM, = (F1(- | “), oe a -)), 


where the f,(- | -) are of form (10), will be called an admissible design; their 
entirety will be denoted by I%,. For Theorem 1.3.1 following, let the previous 
assumptions on (1) be modified accordingly. 

Let us introduce some classes of differentiable functions. For any natural 6 
set O(8, L) = {t | f exists, is continuous, ||f\|,, + ||f”||.. S L}. For the lower 
risk bound it is convenient to consider the subclass C,(6, L) of those f in C(f, L) 
which have support contained in (0, 1). 

It is also of interest to consider infinitely differentiable or analytic functions. 
Let A(f, L) be the class of functions on R which are periodic with period 1, 
and admit an extension to the complex domain |Im z| < f such that they are 
analytic and bounded in modulus in this domain by L. By A(f, L) we shall 
also denote the class of restrictions of such functions to the interval [0, 1]. 


112 Chapter 1. Parameter estimation and testing hypotheses in nonlinear models 


Theorem 1.3.1 (IKh, 1980b, 1982a). Let 1: R,—-IR, be a monotone function 
such that l(0) = 0. For a given function class F ,a sequence {6,} and p,1 < pS 00 
define 
t= liminf inf inf sup B,l(d,\lf — fllp) (11) 
n f Myem, feF 
In each of the cases (a) —(d) below, there is a constant C > 0 not depending on 
L such that + => 1(C)/2. 


(a) F = 0,(6,L), PS p < oo, 0, = nah) 

(b) F as in (a), p = &, by = (n/n n)Pl(26+1) 

(c) F = A(B, L), 1 = p < 00, 6, = (nj/In n)¥2 
(d)_# asin (¢), p = co, 6, = (n/(In m) (In In Welles 


For estimating f at a given point x, the best possible design would be con- 
centrated in the point x, whence the problem would be a parametric one. There- 
fore, NP risk bounds for estimating f(x) should assume a given design, close 
to a uniform one in some sense. The result of Stone (1980) concerns the case 
where the x; are random. 


Theorem 1.3.2 (Stone, 1980) Suppose that {x;} are t.i.d. random variables 
with values in [0, 1], independent of {&;}, which have a density that 1s bounded 
away from zero and infinity on [0, 1]. Let x € [0, 1], | be a function as in Theorem 
1.3.1. For a given function class F and a sequence 6, define 


— liminf inf El(6,(f(x) — : 
Pa rae Ae 


If F = O(B, L), 6, = n*!F+1), then there is a constant C > 0 not depending 
on L such that + => I(C)/2. 


We remark that this lower bound also holds if {&} is the sequence of nonrandom 
uniform designs. 

Let us now turn to the question of how to attain these risk bounds. The 
bounds of Theorem 1.3.1 admit experimental planning; the optimal estimators 
given by [Kh (1980b, 1982a) are based on experimental planning in the form 
of a replicated observation scheme. But it is also of interest to show the attain- 
ment of the optimal rate for more irregular design sequences given in advance; 
we will come to this later. Attainment can be shown for smoothness clases which 
are wider than C(6, LZ) involving generalized derivatives. Define the Sobolev 
class W(f, p, L) for BE N, 1S psoo by Wf, p, L) = {f | fF» exists, is 
absolutely continuous, |/f||} + ||f|} < L}. 

Observe that each class W(f, p, L) contains a class C(f, L) with the same p 
but possibly different Z. The design scheme is of the following form. Let r € IN, 
even, and x; = (7 — 1)/(2r + 1), 7 = 1,..., 2r + 1. Observations are 


Y= f(a;) mAs 7 1... ar 1, me, (12) 


1.3. Some topics in nonparametric regression 113 


obtained as averages from m replications. Then n — (2r + 1) m is the total 
number of observations. Consider an estimator 
ar+1 


i perc = (2r 3 8 LyVelw a %;), 
= 


where V, is the trigonometric Vallée-Poussin kernel: V,(t/2a) = 7-2 (cos ((r/2 
+ 1) t) — cos ((r + 1) t)) | sin? (é/2). {V,} is a sequence of functions (trigono- 
metric polynomials) tending to the delta function at zero. This estimator is 
based on approximation theory by means of trigonometric polynomials, and 


hence will be good when the regression function satisfies periodic boundary 
conditions. Consider the periodic Sobolev classes: 


WB, p, L) = {f € W(B, p, L) | (0) = f*(1), k = 0,...,8 — 1}. (18) 
Note that W(6, p, L) > O)(B, L’) for some L’ > 0. 


Theorem 1.3.3 (IKh 1980b, 1982a). Suppose that observations are given by 
the model (12), and that 1: R,— IR, is a measurable function such that U(x) 
S C exp (x*). In each of the cases (a) —(d) of Theorem 1.3.1, the numbers m and 
r can be chosen such that 


inf sup sup E,l(C5,llf mr — fllp) < 00 (14) 
c>O nn fEeF ; 
If the additional condition Bb > 1/p is fulfilled, then this statement is also true 
with Cy(B, L) replaced by W(B, p, L). 


Even wider function classes (Hélder classes in L,) can be admitted in these 
results, allowing also non-entire degree of smoothness f. 

We see that these results solve the asymptotic efficiency problem, at the 
rate of convergence level, for the cases considered. (For the estimation of f 
at a point, attainment of the risk bound of Theorem 1.3.2 was established by 
Stone (1980)). Nevertheless, it is still desirable to remove the assumption of 
special experimental planning with Theorem 1.3.3, and also the assumption 
of periodic boundary conditions fulfilled by the regression function. Various 
results in this respect are available in the literature, mostly based on families of 
linear smoothing operators {S,,,,” € IN, r > 0}. Among these, all well-known 
NP estimation techniques are represented, such as kernel methods, piecewise 
polynomial and spline smoothing. We shall at first note a result of Stone (1982) 
concerning weighted least squares estimation by piecewise polynomials. 

The smoothing operators S,,, are constructed as follows. Let r¢ N and 
mr be the class of functions which are polynomials of degree at most m — 1 
on each interval [(j — 1)/r, j/r), 7 = 1,...,7. Note that functions in 2;,, can 
have jumps. A least squares fit to the data by such functions is considered. For 
the asymptotics, one should require that n is the effective observation number 
also locally, everywhere on the interval. This can be formalized as follows. 


8 Nonlinear Regression 


114 Chapter 1. Parameter estimation and testing hypotheses in nonlinear models 


Let for natural d be {Q;, 7 = 1, ..., d} a partition of [0, 1] into nonintersecting 
intervals of length d“1, and set 
o(d) = sup 14(Qi)/|Oi 9 é| n-*, 
1Si<d 
where |-| is the cardinality of a set. We assume the existence of a sequence 
{d} = {d,} such that 


r= o(d), v(d)—O(1) as. n, ro. (15) 


This condition does not exclude considerable local deviations of the design 
from the uniform grid with step n~1. Thus, in the least squares criterion, 
weights w; should be used: 


Uy £y(Qi)/|Qi né| for x; € Qi, ) faa eric 


Consider the minimization problem 


min {= u(y; — 9(;)) |9 € Enc} (16) 
j=1 
If condition (15) is fufilled, then this problem has a unique solution f,,,, for 


all sufficiently large n. 


Theorem 1.3.4 Suppose that observations are given by the model (1), and that | 
is a function as in Theorem 1.3.3. If fm, is the minimizer of (16), then relation 
(14) is valid in each of the cases: 
(a) F = W(8, p, L), 1S p < 0c, B > 1p, 6, = nile), m > B—1 

(m fixed), r ~ O4®, »(d) = O(1), d/r > ov. 
(b) p = 00, 6, = (n/In n)#!26+4), all else as in (a). 


This result (a slight modification of that of Stone, 1982) provides, for finite 
smoothness f, the same properties of the estimators as Theorem 1.3.3. Moreover, 
only the minimal condition (15) is required for the design, and no boundary 
conditions are imposed on /f. 

The piecewise polynomial estimator of Theorem 1.3.4 has the drawback 
that it is not smooth. Introducing additional smoothness conditions on the 
functions in 2',,, would lead to spline estimators. Before discussing these, 
however, let us briefly review the kernel method, following Gasser and Miiller 
(1979). Consider a function K: IR — IR with the properties 


ERG) de), | Riayh dg 0° Oe jean eerie ale (17) 


Kernel estimation is based on the approximation properties of the family of 
convolution operators 
1 
(K, « f) (x)= | K,@ — 8) f@) dt, K,(@) =7K(r2), r > 0. 


0 


1.3. Some topics in nonparametric regression 115 


There are various possibilities for approximating the convolution integral 
from the data, leading to different kernel estimators. The variant 


f(z) = Pe K,(x — x;) y;/ Y K,(a — 2;) 
7— 1 q=1 

requires a stronger uniformity condition on é to be rate-optimal. In the litera- 

ture, this is extensively treated for the case of random iid. design points 

x; (see e.g. Collomb, 1981). Another possibility is 


fe) = pu K(x — %;) yjU;; (18) 

g=1 
this estimator is similar to one of Priestley and Chao (1972). Gasser and Miiller 
(1979) proposed another method, as follows. Let y*(”) be the random function 
on [0, 1] which, for x € Q;, equals the average of those y; for which x; € Q;. Let 


f(x) = { K,(x —t) y*(t) dt. (19) 


In general, for regression on [0,1], a kernel K with compact support (say 
{—1, 1]) has to be used. Moreover, in the boundary regions [1 — r-}, 1] and 
[0, 7-1], one should use one-sided kernels to overcome edge effects, i.e:, kernels 
with support [0, 1] or [—1, 0], respectively, fulfilling (17). Gasser and Miiller 
(1979) established some optimal rate results for such estimators, based on 
(18) or (19). We note a generalization as follows. 


Theorem 1.3.5 Let fm, be the estimator (19), boundary modified as described, 
where the three kernels used are square-integrable in addition. Then Theorem 1.3.4 
holds with this meaning of f m,r- 


Miiller (1984) constructed appropriate kernels that are, in addition, smooth 
(on IR), which allows the estimator to be made as smooth as f itself. The latter 
two authors also considered the constants appearing in the asymptotic risk, 
and addressed optimality within classes of kernels. A further reference on 
kernel estimators of type (19) is Cheng and Lin (1981). For an application of 
the method of orthogonal series in the present context, we refer to Koryakin 
(1983). 


1.3.38 Spline smoothing 


Splines are a sophisticated tool of approximation theory; their connection 
with the finite element method in numerical analysis has stimulated their 
development. Some of the procedures, such as the Reinsch-Schoenberg smooth- 
ing spline, have been specifically proposed for the case of noisy data, i.e., for 
NP regression. We shall attempt to survey what is known with regard to 
statistical optimality. 


8* 


116 Chapter 1. Parameter estimation and testing hypotheses in nonlinear models 


Those elements of the space of piecewise polynomials 2,,,, which have some 
continuity or smoothness properties are splines (with equidistant knots). A 
least squares problem similar to (16) for the class of splines 27, = 2m, C™? 
was considered by Agarwal and Studden (1980). A convenient basis of 27, 
is formed by the B-splines. Let By be the m-fold convolution power of the indi- 
cator of [0,771], let B(x) = B(x — jr-+), « € R for each entire 7. Each B; 
is a B-spline with support [jr-1, (j + m) r-1], the set of restrictions to [0, 1] 
of the functions B;, 7 = —m + 1,...,r — 1 spans 2%,,. Consider the minimi- 
zation problem 


min {5 (oi — ale) Lo € Bu] (20) 
j=1 

Theorem 1.3.6 Let fm, be the minimizer of (20). Then Theorem 1.3.4 holds 
with this meaning if f m.r- 


We see that the least squares spline estimator attains the optimal rate in 
L,, 1S pS cw. Agarwal and Studden (1980) proved this for p = 2. These 
authors also analysed the constant appearing in the asymptotic risk and pro- 
posed some optimization within a class of spline estimators. 

Note that m can be chosen equal to 6 + 2, whence the estimator is in C?, 
i.e., satisfies the smoothness assumption made on the regression function. 

Let us now turn to the classical smoothing spline. Consider the Sobolev space 
We = {f | f(™» exists, is absolutely continuous, /(™ € L,} and the minimi- 
zation problem, for 7 > 0, 


min for > (ws ; — glx)? + rg |g € wet. : (21) 


The solution f,,, is unique if € contains at least m + 1 different points, and 
f mr is a spline which is a linear function of the data. This estimator is thus of 
the linear smoothing type, with a particularly appealing heuristic motivation. 
In the risk aaa the bias part can be treated very simply as follows. 
Note that # OF m.x( (x) is the smoothing spline for data f*. Then, if f €¢ W(m, 2, L), 
mr(&) = Bf te) , x € [0, 1], 


m2 YS (fas) — Palay))® S 0-2 Y (Fler) — Pyalee))® + eA, -)MIB 


j=l j=1 
Srp lb Sth, 


since /'),, solves (21) for y = f*. The left- ae side of this chain is an approxi- 
mation to the bias part of the risk ||f — f?,,,|[3. The difficulty now remains in 
evaluating the variance part of the risk; this requires eigenvalue estimates for 
a certain approximation to a differential operator in L,. Results for the case 
of uniform design were given by Wahba (1978), Craven and Wahba (1979). 
Uireras (1980, 1983). These were generalized by Cox (1983, 1984) to a certain 


1.3. Some topics in nonparametric regression 117 


class of nonuniform designs. Let ®, be the distribution function which assigns 
the mass n~! to each point of &. Define 
d, = sup |®(«) — a; 


x€[0,1] 


d, is the Kolmogorov distance of ®, from the uniform distribution on [0, 1]. 


Theorem 1.3.7 (Cox, 1984) Let F = W(f,2,L) and fm, be the minimizer 
of (21). The relation 


sup Elif me — fllg = O(n 26 2A+») 
feF 


holds if m = max (8, 2), r ~ n4l26+), and if the condition 
dn cs o(n-5!2(26+1)) (22) 
is fulfilled. 


Note that, for the uniform design sequence, we have d, = O(n-), whence 
(22) is met for all natural 8. The paper of Cox (1984) in fact treats the multi- 
variate case (f: % > IR, X CR’, see point 1.3.5 (a) below). There, also limits 
for ®, that are not uniform are admitted, namely distributions with a density 
which is bounded aways from 0 and oo on [0,1]. Ragozin (1983) obtained 
related results on the convergence rates for the estimation of the derivative of 
f via the smoothing spline. 

Rice and Rosenblatt (1981, 1983) also investigated the asymptotic behaviour 
of smoothing splines, with special emphasis on boundary effects. In the context 
of Theorem 1.3.7 their result means that if 6 > m, then f,,, can attain the 
optimal rate only if f satisfies some additional boundary conditions. This was 
further elaborated by Cox (1984). 

The smoothing spline also has an optimality property for saints fata 
point. For any x € [0, 1] and parameter space W(f, 2, L), it is minimax among 
linear estimators of f(x); cf. Zi (1984). For the asymptotics, the following can 
be conjectured. Suppose {&} is the sequence of uniform designs; the sequence 
6, for a result like Theorem 1.3.2 is 6, = n* 2+), x = 6 — 1/2. (For density 
estimation such a result is known; cf. Wahba, 1975). The smoothing spline 
f o,. attains this rate (for U(x) = x?) for a choice r ~ n. 


1.3.4 Optimal rates and exact constants 


The optimality results for NP regression discussed so far are rather weak com- 
pared with what is known in the parametric case. Consider the minimax risk 
as given by (2), for p = 2. In the k-dimensional parametric case, one has results 


of the type 
lim 2A,(F) = &, (23) 


n 


118 Chapter 1. Parameter estimation and testing hypotheses in nonlinear models 


(see (6)), while the optimal rate statements in the NP case ensure the existence 
of positive constants C,, C, such that 


G,(1 + of)) < miler F) < O,(1 + 0(1)) (24) 


Now, certainly, it would be desirable to specify constants C,; = C, and thus 
to obtain an analogue of (23), i.e., of Fisher’s bound for asymptotic variances 
(or of the Hajek AM theorem). The methods of risk evaluation mentioned up 
to now do not provide this information; especially with regard to C, they 
are of qualitative nature. While in the general L, case it appears difficult to 
go further, the Hilbertian structure of LZ, allows some improvement. A signi- 
ficant result in this respect is due to Pinsker (1980). It concerns the estimation 
of a function continuously observed in Gaussian white noise. Consider obser- 
vations given by a stochastic differential 


dy(t) = f(t) dé + n-?2 dW(t), FETE; (25) 


where dW(t) is the derivative of the standard Wiener process, and f is a func- 
tion from a parameter set * < L[,. The problem of NP estimation of f in this 
model, with a small noise asymptotic n — oo, has many traits in common 
with NP density or regression estimation; for optimal rates of convergence 
see [Kh (1981, chapter 7, 1980a). Let {y,};<g be an orthonormal basis of Lp, 
f; = (f, ;) be the Fourier coefficients of f. Assume that F is an ellipsoid: 


Fale Daf sal 
i¢Za 


where a;, 7 € Z are coefficients such that a; > oo for 7 > -oo. It is known 
that the periodic Sobolev class W(f, 2, L) (see (13)) can be described as an 
ellipsoid in terms of the classical Fourier basis of Z,; then a; = 1 + (2zj)??, 
7 € Z. Consider the minimax risk for a squared L,-loss: 


A,(F) = inf sup Elf — fl. 
I fF 
Theorem 1.3.8 (Pinsker, 1980). Suppose that observations are given by the 
model (25). Let F = W(B, 2, L). Then 


lim n26I26+2) 4,(F) = y(B, L), 


n 


VBE) ee ee ee 
This result indeed specifies C,; = C, = y(B, L) in (24); it hence provides 


the exact asymptotic minimax constant. To outline the basic idea, let (25) 
be decomposed into Fourier coefficients: 


ni = Jo dy =f, fe, 28) 


1.3. Some topics in nonparametric regression 119 


where §;, 7 € Z are iid. standard normal. Estimating f is equivalent to 
estimating /;, 7 € Z. Consider linear estimators of f;: 


f;=om, 7¢ 2B; 


where c; are fixed coefficients. Let o = {o7}jez be some sequence of positive 
oe) 2 : 
_ numbers, ¢ = {¢)} jez, x c; < oo and define 
je 


T(c, 0) = }((1 — ¢,)? oF + n-1c%) 
je 


The expression 7'(c, o) can be interpreted in two ways: 


(a) As the risk Hy||f — f\} of a linear estimator f based on c if f? = 0?, j € Z. 
Indeed, 


Elf — fle = X BP; —f)? = X ((1 — 6)? ff 4+ m1). 
eZ jeZ 
(b) As the mixed risk f Ey\\f — f\f dx,(f) of this linear estimator if the prior 
x, sets the f; independent N(0, o}). 


Let F* = {e leds a;0; <= 1h and observe that 


je 
inf sup 7c, o) = sup inf T(c, o). (27) 
c oGF* o€F* Cc 


Indeed, the conditions of the von Neumann minimax theorem (see, e.g., 
Balakrishnan, 1976) are fulfilled; in particular, 7'(c, o) is convex in c, concave 
(linear) in o, and both domains are convex. A saddle point (c*, o*) is 


of = (L— Jat), oft = nda} — 1), oy 


where / is a solution of 
Y an" (Aas?) — 1); = D. (29) 


Since 7'(c*, o*) is the minimax risk among linear estimators, we have 7'(c*, o*) 
> A,(F). But 7'(c*, o*) is also the Bayes risk for a prior 7,*; indeed, since 
+ is Gaussian, the Bayes estimator is linear, and the right-hand side of (27) 
is a Bayes risk. The prior z,« is not concentrated on J; if it were, then one 
coulde conclude A,(F) => 7(c*, o*). But it can be shown that 2,* concentrates 
on F in some sense as n — oo, whence 


A,(F) = T(c*/o*) (1 + of1)), 2 -> 00. 


The asymptotics of 7'(c*, o*) can easily be calculated for the ellipsoid W(B, 2, L), 
yielding the constant y(6, Z). An efficient estimator in this sense is given by 


120 Chapter 1. Parameter estimation and testing hypotheses in nonlinear models 


the linear smoother c* of (28). Note that (29) implies 2 > 0 for n — o, so 
that this estimator conforms to the usual linear filtering scheme where the 
number of nonzero coefficients in the filter increases with n. 

A variant of this method leading to optimal kernel estimators was developed 
by Golubev (1982, 1987), using the Fourier transform of functions g € Z,(IR) 
as a tool: 


A f exp (2rxttax) g(x) da. 
R 


Suppose that, in (25), the function f satisfies some different boundary conditions 
on [0, 1]: f € WB, 2, L), where 


W(6,.2, Lb) =f GW (8, 2,.L),\ f"(0) = 7) = 0, ik 0, 2 By 


This class can conveniently be characterized in terms of the Fourier transform: 
If f € W(6, 2, L), then its zero extension to R (f = 0 on [0, 1]°) satisfies 


f FOP (Qe)? dt < LZ. (30) 
R a 
Consider kernel estimators of f in (25): 
1 
= [ Kle—t)dylt),  K,(e) = rK(r2), rr > 0. (31) 
0 


Suppose K € Z,(IR). The risk is 


Ef, — fie S [i t) — f())P dt + 2 st K7(t) 
aha Mone (t)|? de. 


Using (30), the fact that K,(t) = K(r-4t), and a change of variables, we obtain 
Bylf, — fI2 S essup [1 — Rl Lady #4? + nar f (REOP at 
Put r = n/(26+1); then 
E,lif, — fig nee) < L essup [1 = K(t)|? (2ot)-?? + f \R(t)|? de. 


Denote the functional of K which is on the right-hand side by Us, (K). Observe 
that Us 1(K) < co implies that the kernel K satisfies the usual conditions (17) 
(for m = B). The kernel K*, which minimizes U, ;(K), is given by 


R*(t) = (1 — At), (82) 
where 4 solves 


‘ (A146 — #26), dt = L(2n)-?8. 


1.3. Some topics in nonparametric regression 121 


It turns out that 
y(8, L) = inf Us, = Uz, 1(K*). 
K 


Theorem 1.3.9 (Golubev, 1982) Suppose that observations are given by the 
model (25). Let F = W(B, 2, L). Then A,(F) satisfies the same relation as in 
Theorem 1.3.8. This risk bound is attained by the estimator (31) with r = nV @6+) 
and kernel K* given by (32). 


This is a rather strong result on optimal kernel estimation in L,, a problem 
which has some history in the literature (see Watson and Leadbetter, 1963; 
Davis, 1977; Gasser and Miiller, 1979). Note that the optimal kernel K* has 
support R, since its Fourier transform has compact support. 

We mention that similar results on the asymptotic minimax constant have 
also been obtained in models of NP spectral and probability density estimation 
(LKh, 1982b; Pinsker and Yefroimovich, 1981, 1982). 

Consider now the NP regression model with discrete observations (1). To 
describe the exact asymptotics of 4,(F) in this model, we employ a spline 
approach. In the preceding results on the AM constant, boundary conditions 
were imposed on the function /. It is of interest to dispense with these in both 
the discrete and continuous observation cases. Thus we assume that 
F = W(f, 2, L), the Sobolev class without boundary conditions. It will also 
be supposed that é is the uniform design of size n. First we ask what kind of 
restriction the smoothness assumption f € W(8, 2, L) implies for the function 
values f*. An answer to this is provided by spline theory. Consider the mini- 
mization problem 


min {Igo 1g € Wg = Fh. 


The solution exists and is unique for n => f, and is the natural polynomial 
interpolation spline, denoted by o(f*). Then 


lof) PR S If lle S L. 


Now o(f*) is known to be linear in f*. Hence, |lo(f*) ||} is a quadratic form, with 
matrix I’, say, and we have 


(fF) Df SL. 


Thus we know that the parameter space for the vector f* is contained in an 
ellipsoid in IR”. (It would coincide with this ellipsoid if F were defined by 
\f|2 < L rather than by |/f\} + |If} < LZ. This difference is unessential 
in the sequel.)-Consider the loss 


n 


oan, (7 (;) are f(x;))?. (33) 


j=1 


122 Chapter 1. Parameter estimation and testing hypotheses in nonlinear models 


Let I, = OAO’ be a spectral decomposition of I,, where O is an orthogonal 
n <n matrix and A is diagonal with diagonal elements A,; S Jg2 S .-. S Ann. 
Let g = n-/20’f§ be a transformed parameter vector. For g we have obser- 
vations of structure (26), though of finite dimension n. The loss (33) transforms to 

n n 
> (9; — 9;)?, and g is contained in an ellipsoid in R*: ») aig; < L, where the 
aa ay 
3 coincide with nA,;. The method of Pinsker (1980) jon then be applied to 
this observation scheme; it describes the exact asymptotics of 4,(7). The 
concrete rate and constant then depend on the numbers a; and on L. The rate 
is already known to be n~26/(?4+1) (cf. Theorems 1.3.1, 1.3.4), and it turns out 
that the constant is the same as in Theorem 1.3.8. The proof requires an eigen- 
value estimate for nI’,, viz. it is to be shown that the a; tend to behave like 
(xj). The problem of spectral estimates for the matrix I, associated with 
spline interpolation was treated earlier by Craven and Wahba (1979) and 
Utreras (1980, 1983). 

To describe the asymptotically optimal estimator, let r= n/(6+) and 
define numbers 


€é=1, 15jSrflogn, 6 =K*(G/2r), rilogn<jsn, 


with K* from (32). The é; represent a slightly modified version of the optimal 
filter c* from (28); the number of nonzero é; is of order r. Define 


C= (Gon. (6;; — Kronecker symbol) 


and let 
fi = 060'y 


be an estimator of f*. The estimator of f will be the interpolating spline 
f = o(f*). (34) 


Theorem 1.3.10 Suppose that observations are given by the model (1), and that 
— 1s the uniform design of size n. Let F = W(f, 2, L). Then A,(F) satisfies the 
same relation as in Theorem 1.3.8. This risk bound is attained by the estimator 
(34). 

For details see Nussbaum (1985). The optimal estimator (34) may be viewed 
as a smoothing spline different from the classical one (21). It is known that the 
latter corresponds to a filter 


(1 + r(20j)?4)-1 


(approximately; here r is the number appearing in (21)). Hence it will not be 
optimal in the sense of the AM constant, whatever the coice of r. The optimality 
properties of (21) for estimating f at a point were mentioned earlier. 


1.3. Some topics in nonparametric regression 123 


A method to remove the boundary conditions on f in Theorem 1.3.9 within 
the framework of the kernel method was proposed by Golubev (1987). 
It amounts to a boundary modification similar to the one used by Gasser and 
Miiller (1979) (cf. Theorem 1.3.5). In the boundary regions, any boundary 
kernel fulfilling (17) is used. In the interior of the interval, a sequence of kernels 
with compact support, fulfilling (17), which approximates the optimal kernel 
K* is used. Since K* has noncompact support, it cannot itself be used in this 
scheme because of the boundary effect. We also mention the paper of Golubev 
(1984), which treats the exact asymptotics of 4,(F) for a regression model 
with an expanding interval of observation. 

Results like Theorem 1.3.10 can also be established for nonuniform designs, 
e.g. for those fulfilling condition (22), but we shall not dwell on this. Instead, 
for some more basic insight, we note some results of Sacks and Strawderman 
(1982) concerning the estimation of f at a point x. These authors show that, 
for some smoothness classes * and quadratic loss, linear estimators of f(z) 
attain the optimal rate of convergence but not the best constant in the AM sense. 
This is established by constructing nonlinear improvements of minimax linear 
estimators. Hence, the linear method which led to the result of Theorem 1.3.8 
is not available in these cases, and the calculation of exact AM constants appears 
to be substantially more difficult for the estimation of f at a point. This also 
seems to be the case for losses in Lp, p + 2. 


1.3.5 Some further topics 


(a) The multivariate case 


Consider a regression function of several variables: f: % > IR, % CR‘. 
Assume that % is open and bounded. Convergence rates for estimating f 
at a point were established by Stone (1980). If F is a smoothness class described 
in terms of partial derivatives up to order f, the optimal rate (for squared 
error) is 2~2//(26+*), For estimating with global loss in L,(%), 1S p<, 
lower risk bounds analogous to those in Theorem 1.3.1 have been found by 
Stone (1982) and Nussbaum (1982). To show the attainment of these bounds, 
for domains X of sufficiently arbitrary shape, the edge problem associated 
with a curvilinear boundary has to be solved. Using piecewise polynomial 
estimation, Stone (1982) proved attainment when the loss is defined in L,(2*), 
where &* is some subset of X with positive distance from the boundary of %. 
Cox (1984) established attainment for multivariate smoothing splines, for 
squared L,(Z )-loss. The conditions on % are that & is compact, simply connec- 
ted and has a boundary of class C™. It is possible to prove attainment of optimal 
rates for piecewise polynomial estimation, viz. for loss in L,(%), 1 <p S o, 
and compact & with Lipschitz boundary; see Nussbaum (1986). Even smooth 
linear spline estimators can be employed for this purpose, i.e., linear combina- 
tions of multivariate B-splines. 


124 Chapter 1. Parameter estimation and testing hypotheses in nonlinear models 


(b) Nonlinear estimators and general aspects 


Up to now, only estimators which are linear in the data have been discussed. 
The direct application of the method of maximum likelihood, for the parameter 
spaces ¥ considered, can provide well-defined estimators in the present model, 
as opposed to density estimation. This leads to nonlinear estimators in general ; 
see Nemirovski, Polyak, and Tsybakov (1984). They attain optimal rates in 
some cases where linear estimators fail. Discretized ML estimators have been 
employed in connection with the ‘method of sieves’ (see Geman and Hwang, 1982), 
which has been proposed as a unifying concept for estimators in NP problems. 
A unifying concept for linear estimators is the delta method (Susarla and 
Walter, 1981). Optimal rates of convergence in an abstract general setting of 
‘NP estimation have been studied by Birgé (1983). A recent monograph on 
general curve estimation is Prakasa Rao (1984). 

For a survey of NP regression models with random (x;, y;) see Collomb (1981). 
Limit theorems for estimators and various probabilistic properties have recei- 
ved much attention; see e.g. Liero (1982). 


(c) Robustness 


The normal distributional assumption on the i.i.d. disturbance variables ¢; 
is not necessary for the results on optimal rates; some regularity conditions 
suffice. Some papers address robustness against variation in the error distri- 
bution. It is possible to ‘robustify’ the linear smoothers such as kernel and 
spline methods; the result is a nonlinear smoother whose robustness is for 
example reflected in an extended optimal rate property. The smoothing spline 
case has been dealt with by Cox (1983); for the kernel method we mention 
Tsybakov (1982) and Hardle (1984). 


(d) Adaptive optimal smoothing 


The optimality results for the smoothing methods S,,, discussed so far were 
proven for certain choices of the smoothing parameter 7; these choices depend 
on the prior information on the function class ¥. In practice, information on FJ, 
for instance on the bounds for the derivative, may be rather vague, so that 
it is of major interest to derive a good choice of r from the data. Most of the 
methods proposed are based on cross-validation or variants of it; see e.g. 
Craven and Wahba (1979), Wahba (1981), Utreras (1980). Consistency was 
proved by Lz (1984); a first result on efficiency is due to Hall (1983, 1984). 
This efficiency result, though it is persuasive, does not state an actual asympto- 
tic risk optimality of the adaptive (cross-validated) estimator, however. A 
method not related to cross-validation which achieves this goal has been pro- 
posed by Pinsker and Yefroimovich (1984). Within the ellipsoid framework des- 
cribed in Section 1.3.4, this estimator yields the optimal rate and constant for 
the squared L,-risk, although it does not depend on the concrete parameters 


1.4. References” 125 
ener cr ee ee ee ee i ee ete, She 


of the ellipsoid. Initial results relate to the continuous time model (25) but the 
method is applicable to NP regression too. For further advances on the effi- 


ciency of cross-validation see Rice (1984), Speckman (1985), Hardle and Marron 
(1985). 


1.4 - References 


1.4.1 References for Section 1.1 


Agha, M. (1971). ‘A direct method for fitting linear combinations of exponentials.’ 
Biometrics, 27, 399—413. 

Akahira, M. and Takeuchi, M. (1976). ‘On the second order asymptotic efficiencies of 
estimators.’ Proc. 3rd Japan-USSR Symp. on Prob. Theory, Lectwre Notes in Math. 
550, Springer Verlag, 1976, 604—638. 

Amari, Shun-Ichi (1982). ‘Differential geometry of curved exponential families — cur- 
vature and information loss.’ Ann., Statist. 10, 357—385. 

Anderson, T. W. (1958). An Introduction to Multivariate Analysis. John Wiley, New 
York. 

Anderson, T. W., and Taylor, J. B. (1976). ‘Strong consistency of least squares esti- 
mators in normal linear regressions.’ Ann. Statist., 4, 788 —790. 

Bahadur, R. R. (1964). ‘On Fisher’s bound for asymptotic variance.’ Ann. Math. Stastist., 
35, 1545—1552. 

Bahadur, R. R. (1967). ‘Rates of convergence of estimates and test statistics.’ Ann. 
Math. Statist., 39, 303—324. 

Bard, J. (1974). Nonlinear Parameter Estimation. Academic Press, New York. 

Barham, R. H., and Drane, W. (1972). ‘An algorithm for least squares estimation of 
nonlinear parameters when some of the parameters are linear.’ T’echnometrics, 14, 
757 —766. 

Barnett, W. A. (1976). ‘Maximum likelihood and iterated Aitken estimation of non- 
linear systems of equations.’ J. Amer. Statist. Assoc., 71, 354—360. 

Bates, D. M., and Watts, G. G. (1980). ‘Relative measures of nonlinearity (with discus- 
sion).’ J. Royal Statist. Soc., Ser. B, 42, 1—25. 

Beale, H. M. L. (1960). ‘Confidence regions in nonlinear estimation.’ J. Royal. Statist. 
Soc., Ser. B, 22, 41—71. 

Bird, H. A., and Milliken, G. A. (1976). ‘Estimable functions in the nonlinear models.’ 
Commun. Statist., A5, 999—1012. 

Box, G. H. P., and Coutie, G. A. (1956). “Application of digital computers in the explora- 

_tion_of functional relation ship.’ Proc. I. H. H., 108, Part B, Suppl. Nr. 1, 100—107. 
{ Box, M. J. (1971). ‘Bias in nonlinear estimation.’ J. Royal Statist. Soc., Ser. B, 33, 171 to 

ae 15 

Biwke, H. (1976). ‘Simple consistent estimation in nonlinear regression by data trans- 
formations and design of experiments.’ Math. Operationsforsch. Statist., 7, 715—719. 

Bunke, H. (1977). ‘Linear parameter estimation in nonlinear regression models by pre- 
vious data transformations.’ Biometr. J., 19, 253—256. 

Bunke, H. (1981). ‘A note on parameter estimation in inadequate nonlinear reression 
models.’ Math. Operationsforsch. Statist., Ser. Statist., 12, 7—11. 

Bunke, H., and Bunke, O. (1974). ‘Identifiability and estimability.’ Math. Operations- 
forsch. Statist., 5, 223—233. 

Bunke, H., and Bunke, O. (Eds.) (1986). Statistical Inference in Linear Models. John 
Wiley, Chichester. 


126 Chapter 1. Parameter estimation and testing hypotheses in nonlinear models 


Bunke, H., Henschke, K., Striiby, R., and Wisotzki, C. (1977). ‘Parameter estimation 
in nonlinear regression models.’ Math. Operationsforsch. Statist., Ser. Statist., 8, 
23—40. 

Bunke, H., and Schmidt, W. H. (1980). ‘Asymptotic results on nonlinear approximation 
of regression functions and weighted least squares.’ Math. Operationsforsch. Statist., 
Ser. Statist., 11, 3—22. 

Bunke, O., and Grabowski, B. (1978). ‘A procedure for model choice or variable selection 
with controlled model specification error.’ Math. Operationsforsch. Statist., Ser. 
Statist., 9, 483—497. ; 

Chanda, K. C. (1976). ‘Efficiency and robustness of least squares estimators.’ Sankhya, 
Ser. B, 38, 153—163. 

Chibisov, D. M. (1972). ‘An asymptotic expansion for the distribution of a statistics 
that permits asymptotic expansion.’ Theory of Probability and its Applications, 17, 
658—668, (in Russian). 

Chibisov, D. M. (1973a). ‘An asymptotic expansion for a certain class of estimators 
that includes maximum likelihood estimators.’ Theory of Probability and its Appli- 
cations, 18, 302—310, (in Russian). 

Chibisov, D. M. (1973b). ‘An asymptotic expansion for the distribution of sums of a 
special form, with an application to minimum contrast estimates.’ Theory of Pro- 
bability and its Applications, 18, 689—702, (in Russian). 

Cox, D. R. (1977). ‘Nonlinear models, residuals and transformations.’ Math. Operations- 
forsch. Statist., Ser. Statist., 8, 3—22. 

Draper, N.R., and Smith, H. (1966). Applied Regression Analysis. John Wiley, New 
York. 

Drygas, H. (1971). ‘Consistency of the least-squares and Gauss-Markov estimators in 
regression models.’ Z. Wahrscheinlichkeitstheorie verw. Gebiete, 17, 309—326. 

Drygas, H. (1976). ‘Weak and strong consistency of the least-squares estimators in 
regression models.’ Z. Wahrscheinlichkeitstheorie verw. Gebiete, 34, 119—127. 

Evcker, I’. (1963a). ‘Central limit theorems for families of sequences of random variables.’ 
Ann. Math. Statist., 84, 439—446. 

Hicker, £. (1963b). ‘Asymptotic normality and consistency of the least-squares esti- 
mators for families of linear regressions.’ Ann. Math. Statist., 34, 447—456. 

Ecker, F. (1965). ‘Limit theorems for regressions with unequal and dependent errors.’ 
Proc. of the 5th Berkeley Symp. Math. Statist. Prob., 1, 59—82. Univ. California Press, 
Berkeley. 

Hicker, F. (1966). ‘A multivariate central limit theorem for random linear vector forms.’ 
Ann. Math. Statist., 37, 1825—1828. 

Fedorov, V. V. (1977). ‘Estimation of regression parameters in the case of vector valued 
observations.’ In: Regression Experiments (Kd. V. V. Nalimov). Moscow, Izd. Mosk. 
Univ. (in Russian). 

Gallant, A. R. (1975). ‘Testing a subset of the parameters of a nonlinear regression 
model.’ J. Amer. Statist. Assoc., 70, 927—932. 

Gleser, L. J. (1965). ‘On the asymptotic theory of fixed-size sequential confidence 
bounds for linear regression parameters.’ Ann. Math. Statist., 36, 463—467. 

Gleser, L. J. (1966). ‘Correction to: On the asymptotic theory of fixed-size sequential 
confidence bounds for linear regression parameters.’ Ann. Math. Statist., 37, 1053 
to 1055. 

Goldberger, A. S. (1968). ‘The interpretation and estimation of Cobb-Douglas functions.’ 
Econometrica, 36, 464—472. . 

Goldfeld, S. M., and Quandt, R. HL. (1972). Nonlinear Methods in Econometrics. North- 
Holland Publishing Company, Amsterdam— London. 

Grossmann, W. (1976). ‘Robust nonlinear regression.’ In: Compstat 1976 (Ed. G. Bruck- 
mann), Physica-Verlag, Wiirzburg, 146—152. 


1.4. References 127 
Be ek ON ae aia Rp Dh Sa a ed 


Hamilton, D. C., Watts, G. D., and Bates, D. M. (1982). ‘Accounting for intrinsic non- 
te in nonlinear regression parameter inference regions.’ Ann. Statist., 10, 

Hannan, HE. J. (1971). ‘Nonlinear time series regression.’ J. Appl. Prob., 8, 767—780. 

Hartley, H.O. (1971). ‘The modified Gauss-Newton method for fitting of nonlinear 
regression functions by least squares.’ T'echnometrics, 3, 269—280. 

Hoffmann, K. (1977). ‘Robust alternatives of the least squares estimator.’ Math. Ope- 
rationsforsch. Statist., Ser. Statist., 8, 305—311. 

Huber, P. J. (1964). ‘Robust regression of a location parameter.’ Ann. Math. Statist., 
85, 73—101. 

Jennrich, k. I. (1969). “Asymptotic properties of nonlinear least squares estimators.’ 
Ann. Math. Statist., 40, 633—643. 

Kruskal, W. (1968). ‘When are Gauss-Markov and least squares estimators identical? 
A coordinate-free approach.’ Ann. Math. Statist., 39, 70—75. 

Léuter, H. (1989). ‘Note on the strong consistency of the least squares estimator in 
nonlinear regression.’ Statistics, 20, 2. 

Lawton, H., and Sylvestre, H. A. (1971). ‘Elimination of linear parameters in nonlinear 
regression’. T'echnometrics, 18, 461 —467. 

McGilchrist, C. A. (1968). ‘Efficient difference equation. estimators in exponential 
regression.’ Ann. Math. Statist., 39, 1938—1945. 

Malinvaud, E. (1970). Statistical Methods of Econometrics (2nd rev. ed.). North-Holland 
Publishing Company, Amsterdam — London. 

Marquardt, D. W. (1963). ‘An algorithm for least squares estimation of nonlinear para- 
meters.’ STAM J. Appl. Math., 11, 431—441. 

Michel, R. (1975). ‘An asymptotic expansion for the distribution of asymptotic maxi- 
mum likelihood estimators of vector parameters.’ J. Multivariate Anal. 5, 67—82. 

Natanson, I. P. (1955). Konstruktive Funktionentheorie. Akademie-Verlag, Berlin. 

Nelder, J. A. (1961). ‘The fitting of a generalization of the logistic curve.’ Biometrics, 
18, 89—110. 

Nelder, J. A. (1962). ‘An alternative form of a generalized logistic function.’ Biometrics, 
18, 614—616. 

Nussbaum, M. (1977). ‘Asymptotic efficiency of estimators in the multivariate linear 
model.’ Math. Operationsforsch. Statist., Ser. Statist., 8, 173—198. 

Petersen, I. (1969). ‘Comparison of the method of reproducing kernels with the method 
of least squares.’ Izv. AN Eston. SSR, Fiz. Mat., 18, 403 (in Russian). 

Pfanzagl, J. (1973). ‘Asymptotic expansion related to minimum contrast estimators.’ 
Ann. Statist., 1, 993—1026. 

Pfanzagl, J. (1973b). ‘Asymptotically optimum estimation and test procedures.’ 
Proc. Prague Conf. on Asymptotic Methods of Statistics, 1, 201—272. 

Rasch, D. (1967). Schdtzprobleme bei eigentlich nichtlinearen Regressionsfunktionen. Abh. 
Dt. Akad. Wiss., 121—128. 

Ratkowski, D. A. (1983). Nonlinear Regression Modeling: A Unified Practical Approach. 
(Statistics: Textbooks and Monographs Series, Vol. 48), Marcel Dekker, Inc., New 
York. 

Saleh, A. E., and Choudry, G. H. (1975). ‘On fitting exponential regressions.’ Statistische 
Hefte, 16, 213—222. 

Schmidt, F. (1983). Kleinste Quadrat Schatzwng in nichtlinearen Regressionsmodellen. 
Vandenhoeck & Ruprecht, Gottingen. 

Schmidt, W. H. (1975a). ‘Asymptotic normality of least-squares estimators in multi- 
variate singular linear models.’ Math. Operationsforsch. Statist., 6, 285—300. 

Schmidt, W. H. (1975b). ‘Asymptotic optimality of estimators in multivariate linear 
models.’ Math. Operationsforsch. Statist., 6, 713—731. 


128 Chapter 1. Parameter estimation and testing hypotheses in nonlinear models 


Schmidt, W. H. (1976). ‘Strong consistency of variance estimation and asymptotic 
theory for tests of the linear hypothesis in multivariate linear models.’ Math. Opera- 
tionsforsch. Statist., 7, 701—705. 

Schmidt, W. H. (1977). ‘Asymptotics in multivariate linear models with optimal experi- 
mental designs.’ Math. Operationsforsch. Statist., 8, 447—452. 

Schmidt, W. H. (1979). ‘Asymptotic results for estimation and testing variances in 
regression models.’ Math. Operationsforsch. Statist., Ser. Statist., 10, 209 —236. 

Schmidt, W. H., and Zwanzig, S. (1986). ‘Second order asymptotics in nonlinear re- 
gression.’ J. Multivariate Anal., 18, 187—215. 

Schénfeld, P. (1969). Methoden der Okonometrie, Bd.1: Lineare Regressionsmodelle. 
Verlag Franz Vahlen GmbH, Berlin und Frankfurt (Main). 

Stoer, J. (1972). Numerische Mathematik I. Springer-Verlag, Berlin. 

Wisotzki, C. (1977). ‘Polynomial approximation of nonlinear regression functions.’ 
Math. Operationsforsch. Statist., Ser. Statist., 8, 313—321. 

Wu, Chien-Fu (1981). ‘Asymptotic theory of nonlinear least squares estimation.’ Ann. 
Statist., 9, 501—513. 

Zwanzig, S. (1980). ‘Inadequate least squares.’ Math. Operationsforsch. Statist., Ser. 
Statist., 11, 23—48. 


1 References for Section 1.2 


Agarwal, G.G., and Studden, W. J. (1978). ‘Asymptotic design and estimation using 
linear splines.’ Commun. Statist. — Simula. Computa., B7, 309—319. 

Ahlberg, J. H., Nilson, H. N., and Walsh, J. L. (1967). The Theory of Splines and Their 
Applications. Academic Press, New York. 

Bacon, D. W., and Watts, D. G. (1971). ‘Estimating the transition between two inter- 
secting straight lines.’ Brometrika, 58, 525—534. 

Barnard, G. A. (1959). ‘Control charts and stochastic processes.’ J. Royal Statist. Soc., 
Ser. B, 21, 239—271. 

Borodjuk W.P., and Lezki, BH. K. (1977). Grundlagen der Verfahrenstechnik und chemi- 
schen Technologie. Statistische Modellierung verfahrenstechnischer Systeme. Akademie- 
Verlag, Berlin. 

Brown, R. L., Durbin, J., and Evans, J. M. (1975). ‘Techniques for testing the consi- 
stency of regression relationships over time (with discussion).’ J. Royal Statist. Soc., 
Ser. B, 37, 149—192. ; 

Bunke, H. (1973). ‘Approximation of regression functions. 
Statist., 4, 314—325. 

Bunke, H., and Bunke, O. (1974). ‘Das empirische Entscheidungsprinzip und die Wahl 
von Regressionsmodellen.’ Biometr. Zeitschr., 16, 167—184. 

Bunke, H., and Bunke, O. (Eds.) (1986). Statistical Interference in Linear Models. John 
Wiley, Chichester. 

Bunke, H., and Schulze, U.(1984). ‘Approximation of change points in regression models.’ 
Proc. of the 1st Intern. Tampere Seminar on Linear Statist. Models and their Appli- 
cations, Univ. of Tampere. 

Buse, A., and Lim, L. (1977). “Cubic splines as a special case of restricted least squares.’ 
J. Amer. Statist. Assoc., 72, 64—68. 

Chow, G. C. (1960). ‘Tests of equality between sets of coefficients in two linear regres- 
sions.’ Hconometrica, 28, 591—605. ; 

Dathe, H. M., and Miller, P. H. (1980). ‘A contribution to spline regression.’ Biometr. J., 
22, 259 — 269. 

Dumncz, B. L. (1969). ‘“Discontinuities in the surface structure of alcohol-water mix- 
tures.’ Kolloid-Zeitschr. wu. Zeitschrift f. Polymere, 230, 346—357. 


? 


Math. Operationsforsch. 


1.4. References 129 


I a ee ee ee ee, 


Eder, F. X. (1968). Moderne MeBmethoden der Physik, Teil 1. VEB Deutscher Verlag 
der Wissenschaften, Berlin. 

Eriel, J. H., and Fowlkes, E. B. (1976). ‘Some algorithms for linear spline and piecewise 
multiple linear regression.’ J. Amer. Statist. Assoc., 71, 640—648. 

Fair, R. C., and Jaffee, D. M. (1972). ‘Methods of estimation for markets in disequili- 
brium.’ Hconometrica, 40, 497—514. 

Farley, J. U., and Hinich, M. J. (1970). ‘A test for a shifting slope coefficient in a linear 
model.’ J. Amer. Statist. Assoc., 65, 1320—1329. 

Feder, P.I. (1975a). ‘On asymptotic distribution theory in segmented regression problems 
— identified case.’ Ann. Statist., 3, 49—83. 

Feder, P. I. (i975b). “The log likelihood ratio in segmented regression.’ Ann. Statist., 8, 
84—97. 

Ferreira, P. H. (1975). ‘A Bayesian analysis of a switching regression model: known 
number of regimes.’ J. Amer. Statist. Assoc., 70, 370—374. 

Gallant, A. R., and Fuller, W. A. (1973). ‘Fitting segmented polynomial regression 
models whose join points have to be estimated.’ J. Amer. Statist. Assoc., 68, 144—147. 

Garbade, K. (1977). “IT'wo methods for examining the stability of regression coefficients.’ 
J. Amer. Statist. Assoc., 72, 54—63. 

Goldfeld, S. M., and Quandt, R. HL. (1972). Nonlinear Methods in Econometrics. North- 
Holland Publ. Comp., Amsterdam. 

Goldfeld, S. M., and Quandt, R. H. (1973). ‘A Markov model for switching regressions.’ 
J. Econometrics, 1, 3—16. 

Guthery, S. B. (1974). ‘Partition regression.’ J. Amer. Statist. Assoc., 69, 945—947. 

Hackl, P. (1980). Testing the Constancy of Regression Models over Time. Angew. Sta- 
tistik u. Okonometrie, Heft 16. Vandenhoeck & Ruprecht, Gottingen. 

Halpern, HE. F. (1973). ‘Bayesian spline regression when the number of knots is unknown.’ 
J. Royal Statist. Soc., Ser. B, 35, 347—360. 

Hinkley, D. V. (1969). ‘Inference about the intersection in two-phase regression.’ Bio- 
metrika, 56, 495—504. 

Hinkley, D. V. (1971). ‘Inference in two-phase regression.’ J. Amer. Statist. Assoc., 66, 
736—743. 

Holbert, D., and Broemeling, L. (1977). ‘Bayesian inferences related to shifting sequences 
and two-phase regression.’ Comm. Statist.-Theor. Meth., A6, 265—275. 

Hudson, D. J. (1966). ‘Fitting segmented curves whose join points have to be estima- 
ted.’ J. Amer. Statist. Assoc., 61, 1097—1124. 

Jennrich, R. I. (1969). ‘Asymptotic properties of nonlinear least squares estimators.’ 
Ann. Math. Statist. 40, 633 —643. 

Jupp, D. L. B. (1978). ‘Approximation to data by splines with free knots.’ SIAM J. 
Num. Anal., 15, 328—343. 

McGee, V. E., and Carleton, 7. W. (1970). ‘Piecewise regression.’ J. Amer. Statist. Assoc., 
65, 1109—1124. 

MacNeill, I. B. (1978). ‘Properties of sequences of partial sums of polynomial regression 
residuals with applications to test for change of regression at unknown times.’ Ann. 
Statist., 6, 422 —433. 

Park, S. H. (1978). ‘Experimental designs for fitting segmented polynomial regression 
models.’ Technometrics., 20, 151—154. 

Paul, R. (1974). Halbleiterphysik. VEB Verlag Technik, Berlin. 

Physik in Ubersichten. (1973). Volk und Wissen, Berlin. 

Poirier, D. J. (1973). ‘Piecewise regression using cubic splines.’ J. Amer. Statist. Assoc., 
68, 515—524. 

Quandt, R. E. (1958). ‘The estimation of the parameters of a linear regression system 
obeying two separate regimes.’ J. Amer. Statist. Assoc., 53, 873 —880. 


9 Nonlinear Regression 


130 Chapter 1. Parameter estimation and testing hypotheses in nonlinear models 


Quandt, R. E. (1960). ‘Tests of the hypothesis that a linear regression system obeys two 
separate regimes.’ J. Amer. Statist. Assoc., 55, 324—330. 

Quandt, R. H. (1972). ‘A new approach to estimating switching regression.’ J. Amer. 
Statist. Assoc., 67, 306—310. 

Quandt, R. E., and Ramsey, J. B. (1978). ‘Estimating mixtures of normal distributions 
and switching regression. (With discussion).’ J. Amer. Statist. Assoc., 73, T30—752. 

Ramsey, J. B. (1969). ‘Tests for specification errors in classical linear least-squared 
regression analysis.’ J. Royal Statist. Soc., 31, 350—371. 

Robison, D. HE. (1964). ‘Estimates for the points of intersection of two polynomial 
regressions.’ J. Amer. Statist. Assoc., 59, 214—224. 

Roy, S. N. (1953). ‘On a heuristic method of test construction and its use in multivariate 
analysis.’ Ann. Math. Statist., 24, 220—238. 

| Schmidt, P., and Sickles, R. (1977). ‘Further evidence on the use of the Chow test under 
heteroscedasticity.’ Hconometrica, 45, 1293 —1298. 

Schulze, U. (1973). ‘Regressionsmodelle mit verschiedenen Zustaénden.’ Diplomarbeit, 
Humboldt-Universitat, Berlin. 

Schulze, U. (1977a). ‘Estimation of the unknown change-point between regression 
regimes.’ ikm VII. Intern. Kongr. tber Anwendungen d. Math. in den Ingenieur- 
wissensch. mit d. Rahmenthema: Anwendungen d. elektronischen Datenverarbeitung 
im Bauwesen, Weimar 1975. 

Schulze, U. (1977b). ‘Identifikation von Zustandsinderungen.’ Poster session, 3. Intern. 
Sommerschule ‘Modellwahl’, Miihlhausen. 

Schulze, U. (1982). ‘Modelle mit Zustandsinderungen.’ Dissertation. Akademie der 
Wissenschaften der DDR, Berlin. 

Sprent, P. (1961). ‘Some hypotheses concerning two phase regression lines.’ Biometrics, 
17, 634—645. 

Thalheim, W. (1977). ‘Prifung linearer Modelle.’ Diplomarbeit, Humboldt-Universitat, 
Berlin. 

Toyada, T’. (1974). “Use of the Chow test under heteroscedasticity.’ Econometrica, 42, 
601—608. 

Wold, S. (1974). ‘Spline functions in data analysis.’ T'echnometrics, 16, 1—11. 

Yayatissa, W. A. (1977). “Tests of equality between sets of coefficients in two linear 
regressions when disturbance variances are unequal.’ Hconometrica, 45, 1291 —1292. 


1.4.3 References for Section 1.3 


Agarwal, G. G., and Studden, W. J. (1980). ‘Asymptotic integrated mean square error 
using least squares and bias minimizing splines.’ Ann. Statist., 8, 1307—1325. 

Balakrishnan, A. V. (1976). Applied Functional Analysis. Springer-Verlag, New York. 

Beran, R. (1982). “Robust estimation in models for independent non-identically distri- 
buted data.’ Ann. Statist., 10, 415—428. 

Birgé, L. (1983). ‘Approximation dans les espaces métriques et théoric de l’estimation.’ 
Z. Wahrsch. verw. Gebicie, 65, 181 —237. 

Bunke, O. (1985). ‘A nonparametric small sample theory of estimation of regression 
functions.’ Proc. Fourth Pannonian Symp. on Math. Statistics, Bad Tatzmannsdorf 1983, 
North-Holland Publ. Co, Amsterdam. 

Cheng, K. F., and Lin, P. E. (1981). ‘Nonparametric estimation of a regression function.’ 
Z. Wahrsch. verw. Gebiete, 57, 223 —233. : 

Collomb, G. (1981). ‘Estimation non-paramétrique de la régression: revue bibliographi- 
que. ‘Internat. Statist. Review, 49, (1), 75—93. 

Cox, D. D. (1983). ‘Asymptotics for M-type smoothing splines.’ Ann. Statist., 11, 530 
to 551. 


1.4. References 131 
ere een a a ee sty Shee 


Cox, D. D. (1984). ‘Multivariate smoothing spline functions.’ SIAM J. Numer. Anal., 
21, 789—813. 

Craven, P., and Wahba, G. (1979). ‘Smoothing noisy data with spline functions.’ Nu- 
merische Math., 31, 377—403. 

Davis, K. B. (1977). ‘Mean integrated square error properties of density estimates.’ 
Ann. Statist., 5, 580—535. 

Fedotov, A. M. (1981). ‘An information inequality for operator equations in Hilbert 
space.’ Theory Probab. Appl., 26, 377 —384 (in Russian). 

Fedotov, A. M. (1982). Linear Ill-posed Problems with Random Errors in the Data. 
Nauka, Novosibirsk (in Russian). 

Gasser, Th., and Miller, H.-G. (1979). ‘Kernel estimation of regression functions.’ In: 
Smoothing Techniques for Curve Estimation (Th. Gasser, M. Rosenblatt, Eds.). Lecture 
Notes in Math. 757, 23—68, Springer-Verlag New York. 

Geman, S., and Hwang, C. R. (1982). ‘Nonparametric maximum liklihood estimation 
by the method of sieves.’ Ann. Statist., 10, 401—414. 

Golubev, G. K. (1982). ‘On minimax filtering of functions in L,.’ Problems Inform. 
Transmission, 18 (4), 67—75 (in Russian). 

Golubev, G. K. (1984). ‘On minimax estimation of regression.’ Problems Inform. Trans- 
mission, 20 (1), 56—64 (in Russian). 

Golubev, G. K. (1987). ‘Adaptive asymptotically minimax estimates of smooth signals.’ 
Problems Inform. Transmission, 28 (1), 57—67 (in Russian). 

Hajek, J. (1973). ‘Local asymptotic minimax and admissibility in estimation.’ Proc. of 
the Sixth Berkeley Symposium on Math. Stat. and Probability. University of California 
Press, Vol. 1, 175—194. 

Hall, P. (1983). ‘Large sample optimality of least squares cross-validation in density 
estimation.’ Ann. Statist., 11, 1156—1174. 

Hall, P. (1984). ‘Asymptotic properties of integrated square error and cross-validation 
for the kernel estimation of a regression function.’ Z. Wahrsch. verw. Gebiete, 67, 
175—196. 

Hardle, W. (1984). ‘Robust regression function estimation.’ J. Multivar. Anal., 14, 
169—180. 

Haérdle, W., and Marron, J. S. (1985). ‘Optimal bandwidth selection in nonparametric 
regression function estimation.’ Ann. Statist. 18, 1465—1481. 

Ibragimov, I. A., and Khasminski, R. Z. (1980a). ‘Asymptotic properties of some non- 
parametric estimates in Gaussian white noise.’ In: Proceedings of the Summer School 
in Math. Statistics (Varna, 1978), BAN, Sofia (in Russian). 

Ibragimov, I. A., and Khasminski, R. Z. (1980b). ‘Asymptotic efficiency bounds for 
nonparametric estimation of a regression function in L,. Zapiski nauénych seminarov 
LOMI, 97, 88—101 (in Russian). 

Ibragimov, I. A., and Khasminski, R. Z. (1981). Statistical Estimation: Asymptotic 
Theory. Springer-Verlag, New York. 

Ibragimov, I. A., and Khasminski, R. Z. (1982). ‘Bounds for the risk of nonparametric 
estimates of regression.’ Theory Probab. Appl., 27, 81—-94 (in Russian). 

Ibragimov, I. A., and Khasminski, R. Z. (1982b). ‘On density estimation within a class of 
entire functions.’ Theory Probab. Appl., 27, 514—524 (in Russian). 

Koryakin, A.I. (1983). ‘Estimation of a function from randomized observations.’ 
Zhurn. vychisl, mat. i matemat. fiziki, 28 (1), 21—28 (in Russian). 

Li, Ker-Chau (1982). ‘Minimaxity of the method of regularization on stochastic pro- 
cesses.’ Ann. Statist., 11, 141—156. 

Li, Ker-Chau (1984). ‘Consistency of cross-validated nearest neighbor estimates in 
nonparametric regression.’ Ann. Statist., 12, 230—240. 

Liero, H. (1982). ‘On the maximal deviation of the kernel regression function estimate.’ 
Math. Operat. Statist., Ser. Statistics, 18 (2), 171—182. 


Q* 


132 Chapter 1. Parameter estimation and testing hypotheses in nonlinear models 


Makowski, G. (1974). ‘A rate of convergence of a distribution connected with integral 
regression function estimation.’ Ann. Statist., 2, 829—832. 

Millar, P. W. (1979). ‘Asymptotic minimax theorems for the sample distribution 
function.’ Z. Wahrsch. verw. Gebiete, 48, 233 —252. 

Millar, P. W. (1982). ‘Optimal estimation of a general regression function.’ Ann. Statist., 
10, 717—740 

Miller, H. G. (1984). ‘Smooth optimum kernel estimators of densities, regression curves 
and modes.’ Ann. Statist., 12, 766-—774. 

Nemirovski, A. S., Polyak, B. T., and Tsybakov, A. B. (1984). ‘Signal processing by the 
nonparametric maximum likelihood method.’ Problems Inform. Transmission, 20 (3), 
29—46 (in Russian). 

Nussbaum, M. (1982). ‘Optimal L,-convergence rates for estimates of a multiple re- 
gression function.’ Preprint P-Math-07/82. Academy of Sciences GDR. 

Nussbaum, M. (1985). ‘Spline smoothing in regression models and asymptotic effi- 
ciency in L,.’ Ann. Statist. 18, 984—997. 

Nussbaum, M. (1986). ‘Nonparametric estimation of a regression function which is 
smooth on a domain of R*.’ Theory Probab. Appl. 31, 118—125 (in Russian). 

Pinsker, M.S. (1980). ‘Optimal filtering of square integrable signals in Gaussian white 
noise.’ Problems Inform. Transmission, 16 (2), 52—68 (in Russian). 

Pinsker, M.S8., and Yefroimovich, S. Yu. (1981). ‘Estimation of a square-integrable 
spectral density from a sequence of observations.’ Problems Inform. Transmission 17 
(3), 50—68 (in Russian). 

Pinsker, M.S., and Yefroimovich, S. Yu. (1982). ‘Estimation of a square-integrable 
probability density of a random variable.’ Problems Inform. Transmission, 18 (3), 
'19—38 (in Russian). 

Pinsker, M.S., and Yefroimovich, S. Yu. (1984). ‘A learning algorithm for nonpara- 
metric filtering.’ Avtomatika i Telemekhanika, 11, 58—65 (in Russian). 

Prakasa Rao, B. L. S. (1984). Functional Estimation. John Wiley, New York. 

Priestley, M.B., and Chao, M.T. (1972). ‘Nonparametric function fitting.’ J. Royal 
Statist. Soc., Ser. B, 34, 385—392. 

Ragozin, D. (1983). “Error bounds for derivative estimates based on spline smoothing of 
exact or noisy data.’ J. Approx. Theory, 37, 335—355. 

Rice, J., and Rosenblatt, M. (1981). ‘Integrated mean square error of a smoothing spline.’ 
J. Approx. Theory, 38, 353—369. 

Rice, J., and Rosenblatt, M. (1983). ‘Smoothing splines: regression, derivatives and 
deconvolution.’ Ann. Statist., 11, 141—156. 

Rice, J. (1984). ‘Bandwidth choice for nonparametric regression.’ Ann. Statist. 12, 
1215—1230. 

Sacks, J.,and Strawderman, W. (1982). ‘Improvements on linear minimax estimates.’ In: 
Statistical Decision Theory and Related Topics, III, 2 (S. Gupta, ed.). Academic Press, 
New York. 

Speckman, P. (1985). ‘Spline smoothing and optimal rates of convergence in nonpara- 
metric regression models.’ Ann. Statist., 18, 970—983. 

Stone, C. (1980). ‘Optimal rates of convergence for nonparametric estimators.’ Ann. 
Statist., 8, 1348 —1360. 

Stone, C. (1982). Optimal global rates of convergence for nonparametric regression.’ 
Ann. Statist., 10, 1040—1053. 

Susarla, V., and Walter, G. (1981). “Estimation of a multivariate density function using 
delta sequences.’ Ann. Statist., 9, 347—355. 

T'sybakov, A. B. (1982). ‘Nonparametric estimation of signals with incomplete infor- 
mation on the noise distribution.’ Problems Inform. Transmission, 18 2, 44—60 (in 
Russian). 


1.4. References 133 


Utreras, F. (1980). ‘Sur le choix du paramétre d’ajustement par le lissage par fonctions 
spline.’ Nwmerische Math., 34, 15—28. 

Utreras, F. (1983). ‘Natural spline functions; their associated eigenvalue problem.’ 
Numerische Math., 42, 107—117. 

Van der Linde, A. (1986). ‘Interpolation of regression functions in reproducing kernel 
Hilbert spaces.’ Statistics, 17, 351—361. 

Wahba, G. (1975). ‘Optimal convergence properties of variable knot, kernel and ortho- 
gonal series methods for density estimation.’ Ann. Statist, 3, 15—29. 

Wahba, G. (1978). ‘Improper priors, spline smoothing and the problem of guarding 
against model errors in regression.’ J. Roy. Statist. Soc., Ser. B, 40, 364—372. 

Wahba, G. (1981). ‘Data-based optimal smoothing of orthogonal series density estimates.’ 
Ann. Statist., 9, 146—156. 

Watson, G. S., and Leadbetter, M. R. (1963). ‘On the estimation of the probability den- 
sity I.’ Ann. Math. Statist., 33, 1065—1076. 


Chapter 2 


Robust statistical inference in linear models 


a | General remarks on robustness 


The least squares and the standard normal theory are very attractive for their 
flexibility and the wide applicability to complex linear models. They are good 
provided that the normal distribution is reasonably close to the real problem 
at hand and when outliers are of little concern. 

We know in practice that most models will seldom fit the real situations 
exactly. Thus we must pay attention to the question of what happens to 
specific techniques and procedures if the hypotheses on which they are devel- 
oped do not hold. Stigler’s (1973) historical studies show that already Laplace, 
Edgeworth, Newcomb, and Daniell (among many other earlier investigators) 
cared very much about the influence of the basic assumptions. While they 
recognized that mistakes would be made by using incorrect models, they prob- 
ably had no idea how bad the errors really could be, lacking the computer 
backup. 

H. 8. Pearson (1931) may have been the first who noted the high sensitivity 
to deviations from normality of some standard procedures (namely of the test 
for equality of variances). Incidentally, in connection with the same problem, 
Box (1953) first used the term ‘robustness’. 

In the late 1940s Tukey and the Statistical Research Group at Princeton 
began to emphasize the problem to show the shortcomings of the classical esti- 
mators and to establish properties of several really practicable alternatives 
to them, mainly for the case of estimating the single location parameter. They 
rediscovered and investigated the «a-trimmed mean. Later Tukey (1962) 
remarks: ‘We need: to face up to more realistic problems. The fact that normal 
theory, for instance, may offer the only framework in which some problem 
can be tackled simply or algebraically may be a very good reason for starting 
with the normal case, but never can be a good reason for stopping there.’ 

Moreover, it follows from results of Kagan, Linnik, and Rao (1965, 1973) 
that the least squares estimator coincides with Pitman’s estimator correspon- 
ding to the quadratic loss if and only if the basic distribution is normal; this 
means that the admissibility of the least squares estimator with respect to 
quadratic loss is a characteristic property of the normal law (see also section 
2.1.7 of Bunke and Bunke, 1986). 


The classical procedures are highly sensitive to the gross errors (i.e. to the 
outliers and long-tailed distributions): 10°, of the outliers with standard 
deviation 30 contribute a variance equal to that of the remaining 90% of 
the cases with standard deviation 10. The outliers can double or triple the 
variance, so that cutting out their effect could really increase the precision. 

In the light of these facts, we must seek statistical procedures that are good 
not only for one model but also for a broad class of possible underlying models; 
they need not be necessarily best for any one of them. Box and Anderson 
(1955) introduced the notion ‘robustness’ as follows: procedures are required 
which are ‘robust’ (insensitive to changes in extraneous factors not under test) 
as well as powerful (sensitive to specific factors under test). 

When speaking about robustness, we must keep in mind two points. First, 
the set of distributions (or parameters, or vectors of observations) over which 
the procedure is to be robust. The set may consist of the normal distribution 
only, or it may be the set of all symmetric smooth distributions, a selected 
finite set of distribution shapes, a neighbourhood of one shape, etc. Second, we 
must know the property of the procedure which has to be robust. We may be con- 
cerned with the stability of confidence levels, of the power, of the variance, etc. 

One possible formal definition of robustness was given by Hampel (1971): 
for a sequence {T',} of estimators, the small deviations in the basic distribution 
of the observations should cause small deviations in the distributions of the 
estimates (both measured by Prokhorov’s distance); this proceeds up to a 
‘breakdown point’, the greatest distance from the supposed model at which 
the estimator tells us something. 

This chapter provides a review of some results on robust estimation in the 
linear model. The area of robust estimation and testing has been a permanent 
focus of scientific interest during the last 20 years. The first version of the 
chapter was prepared in 1976; since then the area has undergone considerable 
development. Hence, the text is far from being a complete review of robust- 
ness, though much of the material on this subject may be found by consulting 
the references. The most complete review of robust procedures, with an em- 
phasize on the M-estimators, can be found in Huber’s recent monograph (Huber, 
1981). Some results on robust estimators can be also found in the monographs 
by Serfling (1980) and Lehmann (1983) and in the extensive paper by Bickel 
(1981). Robust tests and estimators are also investigated, with an emphasize 
on the sequential modifications, in the monograph by Sen (1981). On the other 
hand, the relations of different estimators being considered in this chapter 
are not often mentioned in the literature. 

Most of the considerations will be asymptotic for the number n of observa- 
tions increasing infinitely. The exact distributions are available only in several 
special cases; all the above mentioned monographs are mainly based on the 
asymptotic theory as well. Some finite-sample results, based on Monte Carlo 
considerations, can be found, among others, in the Princeton Study (Andrews 
et al., 1972). 


136 Chapter 2. Robust statistical inference in linear models 
. 


2.2 Robust alternatives to the method of least squares 


We shall consider the problem of estimating the regression parameters of a 
linear model. We want to estimate 6 after observing y(, = (Yin +++) Yan)» 
where 


Yin) = XB + €, (1) 


B = (Bi, %-+5 By)’ is a vector of unknown regression parameters, € = (&, ---, En)’ 
is a vector of errors and X, = ((ar4;) = v7? is a matrix of known regression 
constants (design matrix) of rank p. Most of our considerations will be asympto- 
tic as the number of observations n becomes large and the number of regression 
parameters p remains fixed. Thus, the coordinates of y(,) and of X, depend 
on n; we shall not indicate explicitly this dependence unless it causes confu- 
sion. Throughout we shall suppose that €;, 7 = 1,...,, are independent and 
identically distributed with the common distribution function # and density f 
with respect to Lebesgue measure; F' and f are generally unspecified. 

If F is normal with mean 0, the appropriate procedure is to minimize the 
sum of squares 


= (v =: vi) (2) 


t=1 j=1 


or, equivalently, to solve the system of equations 
n p i 
»; (vy: — Subs) 2 = 0, {= Lees PS (3) 
i=1 k=1 
The least squares estimator 
Bi = Wr Xin) » where gs = D9. G5 (4) 


is admissible with respect to the quadratic loss if fae: only if # is normal (see 
2.1.7 of Bunke and Bunke, 1986). 

For the location submodel (p = 1, x;; = 1) three different classes of estima- 
tion procedures alternative to (4) were considered: M-estimators (estimators 
of maximum likelihood type), R-estimators (estimators based on ranks of 
observations) and L-estimators (linear combinations of order statistics). 
These procedures lead — in a more or less straightforward way — to exten- 
sions to a linear regression model. 

We shall work with the residuals 


6,(B) = ASS 258; 5 Cen. (5) 

j=l 
A common idea of all these procedures is to replace the function (2) to be 
minimized by some other function less sensitive to the extreme values of the 


residuals (5). 


2.2. Robust alternatives to the method of least squares (S37 


2.2.1 L-estimators 


In the location submodel, L-estimators are the linear combinations of order 


statistics. If yO <--. <y™ are the ordered observations, the estimators are 
of the form 


n 
B= Day”. (6) 
w=1 
If the coefficients 2; are generated by a suitably chosen weight function J 


1 
such that if J(u) F-(u) du = 0 (this condition guarantees the identifiability 


0 . 
of the parameter) so that 4; = n-“1J 2 


, += 1,...,n, and various other 
n+1 


regularity conditions are satisfied (see Bickel, 1967; Chernoff, Gastwirth, and 
Johns, 1967; Shorack, 1969; Stigler, 1974), then n”2(8 — ) is asymptotically 
normal with mean 0 and variance 


Ky(J,F) = ff J(P()) J(Fty)) [F(min (x, y)) — F(a) Fy) dxdy. (7) 
If F is known, then 


tne "HA 7 
where ; 
#(F0) 
(re ea 
4 {(F-X() 


and F-1(¢) = {inf a: F(x) >t} yields an asymptotically efficient estimator, 
i.e. one which achieves the information inequality lower bound as n —> co 
(Jung, 1955; cf. also Theorem 2.4.12 of Bunke and Bunke, 1986). 

Of particular interest from the point of view of robustness are the «-trimmed 
means corresponding to 


1 
J(t) = 4 1 — 2 


if «ost<1—« 


0 . otherwise. 


L-estimators are computationally appealing and have further attractive 
properties in the location model (cf. Bickel and Lehmann, 1975). However, 
they do not extend to the linear model in a straightforward way. A possible 
regression analogue of L-estimators was suggested and studied by Bickel (1973). 
His estimators, defined in the two-step way with the aid of an initial estimator, 
have good efficiency properties but they are computationally complex. 


138 Chapter 2. Robust statistical inference in linear models 


Koenker and Bassett (1978) extended the concept of the sample quantile to 
the regression model in the following way: for « € (0,1), the «-regression 
quantile B(x) is defined as a solution of the minimization problem 


with respect to ¢ = (t,,...,t,)’, where 0,(y) = y(« — I[y < 0]), y € R. The 
solution of this minimization is generally not uniquely determined but we can 
always give a rule which selects one of the set of solutions (the asymptotic 
behaviour of estimates is independent of this rule). The regression quantiles 
seem to provide a basis for an extension of L-estimators and relative procedures 
to the linear model. Koenker and Bassett (1978) also proposed the trimmed least 
squares estimator, which is defined in the following way: to trim off all ob- 
servations satisfying 


P p 
i= aiyBi(a) SO or a 2i;B;(1 — «) = 0 
j=1 j=1 


(0 < « < 1/2) and then calculate the ordinary least squares estimator from 
the remaining observations. The same authors proposed a modified linear 
programming algorithm for the computation of the regression quantiles. From 
the computational point of view, the trimmed least squares estimator could 
be recommended for the practical applications. 

Ruppert and Carroll (1980) showed that the trimmed least squares estimator 
is, under some regularity conditions, asymptotically normal with the covariance 
matrix o?(«, F) W,' with o7(«, F) being the asymptotic variance of the trim- 
med means and W, = X),X,. This estimator was also studied by Juretkoud 
(1983a, b, 1984) under general conditions. 


2.2.2 M-estimators 


We obtain M-estimators of regression parameters if we minimize 


? Q (v = vB) (8) 
=1 j=l1 


t 


instead of (2), where @ is some (usually convex) function. If we differentiate (8), 
we obtain (with y = 9’) the following system of equations: 


n p 
ee my=0, f=1s5P (9) 
t=1 k=1 
which is equivalent to (8) if @ is convex. 


The class of M-estimators was established by Huber (1964, 1967) for the 
location model and extended by elles (1968) and Huber (1973) to the regres- 


2.2. Robust alternatives to the method of least squares 139 


sion model. A detailed investigation of M-estimators can be found in Huber 
(1981). 

If f is smooth and if y = —f'/f, then the M-estimator coincides with the 
maximum likelihood estimator. Moreover, we obtain the least squares esti- 
mator (4) if f is normal. 

M-estimators are generally not scale-equivariant, i.e. they generally do not ' 
satisfy 


Blky,, RCCY kyn) Fa kB(y,, pie ss, Yn) (k > 0). 


To make an M-estimator scale-equivariant, we should supplement it by an 
appropriate estimator of scale. 

Under various regularity conditions, the above authors proved that the 
M-estimator is asymptotically normally distributed (as n — oo and > is fixed) 
with centre # and covariance matrix K(y, F) W;', where 


K(y, F) = f p(w) dF (a) [ f f(x) dy(x)|-*. (10) 


A more detailed investigation of M-estimators may be found in Section 2.3. 


2.2.3 R-estimators 


Hodges and Lehmann (1963) suggested estimators of location based on the 
Wilcoxon and other rank tests; they showed that their asymptotic variances 
could be computed from the power functions of the tests, and that the estima- 
tors never have much lower but sometimes infinitely higher efficiencies than 
the sample mean. 

Adichie (1967), following ideas of Hodges and Lehmann, defined estimates 
of B, and f, in the regression model y, = f; + for; + €&;,7 = 1,...,, based 
on the Wilcoxon test and found their asymptotic distribution. Jureckovd (1971a), 
Koul (1971) and Jaeckel (1972) then extended the procedure to p-parameter 
regression and to the general rank tests. The three corresponding estimators 
are asymptotically equivalent and thus have the same asymptotic distribution 
and efficiency. 

Roughly speaking, we obtain an R-estimator if we minimize, instead of (2), 


nN 
i=1 
with respect to B = (A, ..., By)’. Here Rf is the rank of 6,(f) in (6,(8), E35 6,(8)), 
a,(-) is Some monotone score function, and a, = fe Y a,(2) 
i=1 


If we differentiate (11), which is a piecewise linear convex function of f, 
we obtain the approximate equalities at the minimum: 


¥ (a,(B8) — Gn) x; ~ 0, ae ee OB (12) 


i=1 


140 Chapter 2. Robust statistical inference in linear models 


These approximate equations in turn can be reconverted into a minimization 
problem, e.g. 


Dp n 
27 a (a,(R®) = a) 2;;| => min! (13) 

j=1|i=1 
The variant (13) was investigated by Jureékovd (1971a) who proved its 
asymptotic normality. This variant is a direct generalization of Hodges and 
Lehmann (1963) and of Adichie (1967) to the p-parameter regression; the esti- 
‘mators are derived by inverting rank tests for hypotheses about #. The variant 
(11) was investigated by Jaeckel (1972) who also proved the asymptotic equi- 
valence of both procedures. The idea is that (11) could be taken as a measure 
of dispersion of the residuals 6;(f); in fact, if z = (z, ..., Z,)’ are observations 
and R,,..., R, their respective ranks, then D(z) =e (a,(Ri) -- Gn) 2; is trans- 


i=1 
lation invariant, D(bz) = bD(z) for b => 0 and D(z) is small if the z; are close 
to each other. We thus minimize D(6(8)) instead of the proper variance of the 
residuals, as is done by the method of least squares. 

The score function a,(7) is supposed to be generated by a nonconstant, non- 
decreasing square-integrable function y(t), 0 < ¢ < 1, in the following way: 


a 
AO) ; Wea Bey 14 
a,,(?) alee) a n (14) 
If f is known and smooth, then 
{'(F-1(t) 


g(t) = ot, f) = — Fees at (15) 


f(F-*()) 
yields an asymptotically efficient estimator. 


Under some regularity conditions the estimators are asymptotically normal 
with mean # and the covariance matrix K;(y, F) W;,', where 


K,(y, F) = fe) at — ( fo a | ( fe ot f) 7 (16) 
0 0 0 


Besides the solution of (11) of (13), the estimators allow two-step versions: 
start with some reasonably good preliminary estimate, and then apply one 
step of Newton’s method to the corresponding system of equations. Such an 
estimate was investigated by Kraft and van Heden (1972) (see also Section 2.5.2). 

From the above remarks we learn that the three estimation procedures follow 
the same idea: to decrease the possible influence of outlying observations. 
Either of them could lead to an asymptotically efficient estimator in the case 
that the basic distribution is known. In fact, as n —> oo, the estimators are 
closely related to one other. For instance, suppose that the respective J-, 


2.3. Properties of M-estimators 141 


y-, and g-functions are smooth and connected together in the following way: 


1 —1 
I(t) = 9") (FW) | foe) “| | 
: (17) 


p(x) = ep(F(a)), ¢>0, 


then the corresponding L-, M-, and R-estimators are asymptotically equivalent 
in probability. The relations (17) depend explicitly on the unknown distri- 
bution F, hence we are not able, for instance, to calculate the value of the 
M-estimator from the known value of the R-estimator, and so on. These rela- 
tions rather show which classes of estimators belong to each other. The asym- 
ptotic relations of different types of estimators are studied in Section 2.6. 


2.3 Properties of M-estimators 


Let us consider the model (2.2.1) under the assumption that we have some 
approximate knowledge of the underlying distribution F’; for instance, suppose 
that F satisfies the following model of indeterminacy established by Huber: 


F=(1—c¢)@6+4 cH, (1) 


where 0 < ¢ < 1 is a known number, @(zx) is the standard normal cumulative 
distribution, and H is an unknown symmetric contaminating distribution. 
Such a ‘model of contamination’ arises, e.g. if the observations are assumed to 
be normal with variance 1, but a fraction of them is affected by gross errors. 
We shall also consider another model of indeterminacy, e.g. 


sup |F(x) — B(a)| <e. (2) 
aelR 
Huber (1964) proposed to take 
~ GP ies Wal Sk 
e(x) = 1 (3) 
bla] —> wif [al > 
and 
x if jal <k 
y(2) = (4) 
k sign a rie ee aa 


respectively, for some k > 0. This choice of yp effectively limits the influence 
of grossly erroneous observations: once a residual exceeds k in absolute value, 
it can be increased beyond any bounds without further changes in the estima- 
ted value of f. Alternative choices of y will be mentioned in Section 2.3.2. 


142 Chapter 2. Robust statistical inference in linear models 


The » given by (4) leads to estimates with well-defined asymptotic and 
finite-sample minimax properties in the special case of location (p = 1, 
av; = 1); at least the asymptotic minimax property carries over to the regres- 
sion case. 


2.3.1 Finite sample minimax properties of M-estimators 
in the location model 


Suppose that y, — 8, ..., Yn — B are independent identically distributed errors 
whose common distribution function F satisfies to the model of indeterminacy 
(2). Define 


Ties {r: Svyi-7)< of 


t=1 


with y given by (4), and put 


if 
Te with probability is 
f gue 
1 
| d bis ta with probability = 


where the randomization does not depend on the y;. 
The M-estimator T° of location has the following minimax property: if 
a > Oisa fixed number and k in (4) depends on « and on a through the relation 


e 24'@(a — k) — O(—a — k) = e(1 + e *), (2) 
then T° minimizes the supremum of the inaccuracy function, 


sup max[P(T < 8 —a),P(T > 6 + a)] (8) 
PeF, pe? 


over all estimators T of 6; F is the set of distributions satisfying (2). 


Remark 2.3.1 One may ask why just the inaccuracy function (8) is taken as 
a measure of performance of the estimator instead of, e.g., the variance. In the 
finite-sample case the variance is not an adequate measure for robust estima- 
tors: the longtailed distribution of the observations may lead to the infinite 
variance of proper estimators. 

For instance, the variance of the sample mean is infinite for the distribution 


1 1h 
with the density f(z) = 1 if |¢| < 5 and f(x) = Upy ae the va- 


32 |ax\§ 
riance of the sample mean does not exist for any distribution the variance 
of which does not exist, such as the Cauchy distribution. 


2.3. Properties of M-estimators __ j 143 


The minimax property of T° is expressed in the following theorem: 


Theorem 2.3.1 (Huber, 1968) Let T° be defined by (5) and (6) with the function 
wp satisfying (4) and (7). Then 

(i) T° ts translation equivariant. ’ 

(ii) If Yy,---, Yn are independent identically distributed random variables such 


that the distribution of y, — B belongs to the system of distributions defined by (2), 
then T° minimizes (8) over all estimators of B. 


Proof. The idea is the following: one first constructs a minimax test of p—a 
against 6 + a and then one derives an estimate from this test in the manner 
of Hodges and Lehmann (1963); it coincides with T°. 

Let fo denote the density of the standard normal distribution; let Py and 
P, be the probability distributions defined by the respective densities: 


Po(X) = fo(x + @) 


(9) 
P(X) = fo(x — a). 


The likelihood ratio 28 = e* is strictly monotone increasing. Introduce 
Po\X 
the following families of probability distributions: 
Py = {Q € F | Q{(—00, t)} = Po{(—oo, t)}} —e forall #¢ RY} 


10) 
igre {Q € F | Q{(t, co)} = Py{(t, co)} — e forall te R}}. ! 


Suppose that Py n P; = Y (this is the case for sufficiently small ¢). We shall 
construct the minimax test € of Py agains A, ie. the test which minimizes 


max | sup Hg’(é), sup Ho:(1 — €)]. (11) 
Q9€Po QiEP 1 


These minimax tests happen to have a simple structure in our case. We shall 
show that there is a ‘least favourable’ pair Qy € Po, Q; € P, such that, for every 
probability ratio test € of Qy against Q,, 


Hgg(é) S Hal) for all Q € Py fe 
Bo,(é) = Eo,(é) for all Qi € P, 


i.e. which satisfies the assumptions of [A 3.4]. 
We shall show that one version of Q) and Q, is given by the densities: 


(Leer Ae Pale pa(e)) 7a poke 
qo(%) = 4 Pol) lz] Sk (13) 
(1 + ©")? [po(x) + Pil(x)] c>k 


144 Chapter 2. Robust statistical inference in linear models 


and 
(1 + e248) [ple) + pz] -@) << —k 
Q(x) = } p(x) jal Sk (14) 


(1+ e-%)-1 polar) + pilz)] @ >k 


where k satisfies (7). Hence the probability ratio of Q) and Q, satisfies 


Tog A a9) ONG) 
o(2) 


and the corresponding probability ratio test between Qo and Q, is of the form 


n 
1 if Dye)>K 
i=1 
Ez*)=)% if Dye) = K (16) 
i=1 
0 if DY yz) < K 
i=1 
the constants K and x are adjusted so that 
Ho é(z*) = Eo(1 — &(e*)) = a, (17) 
where « is the minimax risk. The symmetry of the case implies that K = 0, 
nase 
2 


It remains to show that the test is really minimax, i.e. that it satisfies (12): 
We shall first verify that 


Qi e ae = 0, (en 2 ) 


qi(#1) 
(2) (7) 
giz) = (1) 
Qi =Q <t 
goles) qo(#1) 
for all Q; € P;, 7 = 0, 1. But (18) is trivially true for t < e-24* and ¢ > eek; 
for e- 24% < ¢ < e7% the result follows from (10). 


Suppose that the distribution of 2; belongs to Py, i = 1,:..,n; then (18) 
implies that 


(18) 


2.3. Properties of M-estimators 145 


is the largest stochastically provided the 2; are identically distributed according 
to Q. Analogously, if the distribution of 2; belongs to P,, i = 1,...,”, then 


— 


Il CAGE 


i=1 Yo(#i) 


is the smallest stochastically provided the 2; are identically distributed accor- 
ding to Q,. This further implies (12), so that the likelihood ratio test of Q, against 
Q, is really minimax for the problem. 

Now, for any P, 


P(I"(y) > 0) =: P(T® > p) =— P(T* > f) + + PUT** > Bi) 


1=1 


ap v(yi —B) >of Susgaps v(yi—B) zo 


P&(Y1 — B; --+) Yn — B) 

and similarly 
P(T® < B) S Ep[1 — Ey, — B, ».-, Ya — BI: 

In particular, for Qj € ?;, i = 0,1 
Qo(T? > 0) = Hgeé(y) Su (19) 
Qi(T° < 0) = Eo[1 — &y)] Sa. 


If the distribution of y, — 6 belongs to ¥, then that of y, — 6 —a and of 
y, —6 +a belong to Py and A,, respectively. On the other hand, 7T*, 7T**, 
and 7° are translation invariant: 

T(u, + 0, ...,U, + 0) = T(uy, ..., Un) + 8, 6 € IR. This implies that for any 
IP E ae, 


P(T(y) < B —a) = P(T(y — (B —a)1,) < 0) 


= Q(T(y) <0) Sa (20) 
and similarly, 


P(T(y) > B +a) Se. (21) 


Let T be any translation-equivariant estimator. Then its distribution func- 
tion is continuous under Q, as well as under Q,. Indeed, 


Dee se ha al (0, area beg by 24) ey 
so that 
Poj,-..5%,)=¢ ifandonlyit 2, =t—T(0,%, —%,...,0%, —%)? 


10 Nonlinear Regression 


146 Chapter 2. Robust statistical inference in linear models 


Consequently, given (7 — %,..-,%p — 1) = (Y2,---, Yn), there is exactly 
one point (a, ..., %,) for which T(x), ..., %,) =¢; namely, x, = ¢ — T(0, ys, ..., 
yn) and x; = y; + %, 7 = 2,...,n. This implies (noting the fact that Qo and 
Q, are absolutely continuous) that 


Q(T (Xi, teey X») =t | X, im XxX, = Yo, 23%; AG, a3 XxX, a Yn} = 0 
for every (Yo, .--, Yn) and every t (7 = 0, 1); hence 
Q(T(X1,---.X,)=t)=O0 forevery ¢ (j = 0,1) 


which was to be proved. Particularly, we have Q(T = 0) = 0,7 = 0, 1. 
The estimator T can be used as a test statistic for testing Py against P,, 
rejecting Py if 7’ > 0. Then 


sup max[Qi(T > 0), Qi(T <0)] =a (22) 
Q5€ Po, QiEP x 


because « is the minimax risk for testing Py against ?,. Hence, no translation- 
equivariant estimator T could be better than T°. In connection with the Hunt- 
Stein theorem for estimators ([A 3.6]), this implies that T° is minimax among 
all estimators of 6. 


Remark 2.3.2 The author does not know whether the finite-sample mini- 
max property extends to the regression model. The problem is also that of an 
appropriate measure of performance of the estimators, analogous to (2.2.8). 


2.3.2 Alternative choice of the w-function 


As we have seen, the y defined in (4) leads to the estimator of location with 
finite-sample minimax property over a neighbourhood of the normal distri- 
bution. We will see in subsection 2.3.4.2 that this function provides an esti- 
mator of the regression parameter vector, which is asymptotically minimax 
over the family of e-contaminated normal distributions. 

Let us mention some other possible y-functions which may be appropriate 
in different situations. 


(a) p(x) = x, x € IR. The corresponding estimator is the least squares esti- 
mator. If we want to limit the influence of gross observational errors, then 
it is intuitively clear that y should be bounded function. 

(b) w(x) = sign x, x € IR!. The corresponding estimator is an extension of the 
sample median to the regression model; it is the sample median in the 
location case. 


It has been argued that the influence of extremely discordant observations 
should be reduced to zero; this means that one should choose a y(#) which 
vanishes for large «x. 


2.3. Properties of M-estimators 147 
a re a a 


x if lz] Sk, 


(c) p(x) = 
0 if |z| > k 
(d) If we solve the asymptotic minimax problem for the e-contaminated nor- 
mal distributions with restriction to the functions which vanish for |x| >, 
we obtain the estimator corresponding to the following function: 


19 i gk 

6 tanh fe b(¢ — | uoksrsg 
y(a) = 2 

0 WG =a 

—y(—2) if ass 0 


with b, k depending on « (see Huber, 1969). 


Other functions vanishing outside an interval: 


e 1 ee) ad 

a sign x if a< |z|=6 
e = 2 
te) Bee Pr iereaicl wht b <|z|<c 

c—b 

0 i os [| 
(O<a<6<c). 

ED if (zjsa 
@) ey =}-—Fait a <p 

b—a 

0 if b < |a| 
(0<a<b). 

sin i if |x| <2an 
(g) y(x) = 2a 

0 ie Qasr =. |2)\, 


Despite some advantages, the non monotone p-functions should be used 
with extreme caution: since the corresponding g is nonconvex, the iterative 
determination of the minimum in (2.2.8) may easily lead to a local minimum 
far away from the true minimum. 


10* 


148 Chapter 2. Robust statistical inference in linear models 


2.3.3 Computational aspects and numerical algorithms 


Usually the minimization (2.2.8) does not provide scale-equivariant estimators. 
Hence, estimating the scale parameter simultaneously with regression para- 
meters has been suggested, where the function to be minimized is of the form 


n 1 Pp ; 
> oe e y. —> vb) = min! (23) 
i=1 oO j=1 
and the minimum has to be found under the constraint 
et 1 Pp 1 
atce yi — > eB; =—(n—p)y (24) 
i=1 Oo j=l 2 
with 
y = 2B oxy) (25) 


(the expectation is with respect to the normal distribution; (24) and (25) 
guarantee that the estimator of o is asymptotically unbiased for normal 
errors), and 


x(x) = xy(x) — (x), p(w) = @'(@). (26) 
The minimization (23) under (24) may be proved to be equivalent to the 
minimization of 
n 1 p 
9(B, 0) = Dd) o@ G E ee a) + ao (27) 
i=1 oO pt 
where 


1 
Ute NOES TA (28) 


We will restrict ourselves to the function @ defined in (3). Despite its simple 
form, the solution of (27) cannot be found by a straightforward calculation 
but has to be done iteratively. 

The function y thus has the form (4) for a given k > 0 and the function g in 
(27) is convex in (8, o). Hence, unless the minimum (f, 6) occurs on the boun- 
dary o = 0, it can be equivalently characterized by (p + 1) equations 


. bj 


= (i) . (30) 


P A 
with 5,(6) = y¥; — iaj,b; anda defined in (25)—(28). 


2.3. Properties of M-estimators 149 


We shall briefly review three algorithms for solving (29) and (30). All algo- 
rithms are iterative and improve trial values BO, o™) to BOD 7 oD, m = 0, 
1, 2, ... stepwise. 

We shall consider the model (2.2.1) and assume that the errors €; have the 
same variance o”,7 = 1,..., n. Let «; denote the ith row of X; the rank of X 
is assumed to be equal to p. 

The following partition of the index set I = {1,..., n} with respect to the 
function y and to the residuals 6; = 6,(f), 1 = 1,...,n, will be used in the 
sequel: 


I_ = I(f, 0) = (i : 6; < —ko} 
= I)(, 0) = {i : |8;| < ko} 
Pine  .G \i— Kb Og > NO} 


Let Cy be the matrix such that the ith row of Cy is equal to x; for i € Ip, while 
the other rows of Cy consist of zeros. The gradient of g with respect to 6 for 
fixed o can be written in the form 


Vo —= —— Oly + Cab — bf Ba, — za]. 


1€ 1 a€eI_ 


The matrix of the second derivatives with respect to f is then 


2 ee ee rey 
Aral acre 
Os CP) iia bin * o 


We shall now give a description of three types of algorithms. 


Algorithm H (adaptation of the nonlinear least squares algorithm) 


We need the starting values 8, o and a tolerance level ¢ > 0 (say e = 107%). 


(1) Put m= 0 
(2) Compute residuals 6” = y; — >” apy, 4 = 1, 


(3) Compute a new value for o by 


n (m) 
(o(™+1))2 — 1 (a(™) )? yy a . 


2a i=1 o(m) 


(4) ‘Winsorize’ the residuals, i.e. compute 


6”) 
Ap = omy | : ) [fra Wane YD 


> 
gt) 


(5) Solve X’Xr™) = XA™ with respect to 7™. 


150 Chapter 2. Robust statistical inference in linear models 


(6) Put pom) = pom + grim, where 0 <q <2 is an arbitrary relaxation 
factor. 
(7) Stop if the parameters change their standard deviations by less than « 


times. 
lige Peco) Vejj for all 7 = 1,...,p, where G,; is the jth diagonal element 
of the matrix W = (X'X)-}, and if |Jo@) — o(™| < eo(™), then go to (9). 


(8) Otherwise put m := m + 1 and go to (2). 
(9) Estimate B by B™ and o by of), 


The relaxation factor q in step (6) will be chosen as 
q = [Eoy'(yi)I* = [P(k) — O(—k)y* 


provided 0 <q < 2 (if [O(k) — &(—k)]“! > 1.9, set g = 1.9), where @ is the 
standard normal distribution function. The convergence of the algorithm is 
proved in Huber (1977). 


Algorithm W (adaptation of nonlinear weighted least squares algorithm ) 

Again we start with values 8°, o° and a tolerance level « > 0. This algorithm 
uses a weighted least squares technique and its literation steps are the same as 
those of Algorithm H except for the steps (4)—(5), which are as follows: 


(4) Calculate the ‘weights’: 


(m+1) Ok”) 
pm = 2 »(E] if 6” +0 


Y 26(™) g(mt1) 


pm = — if 6” 0 


(wks ea O8 
define a diagonal matrix P™ with p™ as its ith diagonal element. 
(5) Solve 
X’P™ X (pom 4 om) = X'Pmy 
with respect to 1™). 
The relaxation factor q in step (6) will be put equal to 1. The convergence 
was proved by Dutter (1975a). 
Algorithm S 


Besides starting values f° and o° and a tolerance « > 0, we need an estimate x 
of the ‘downhill’ property of this algorithm (see Dutter, 1975a), which may 
be approximated by x ~ w,/w,, where w, and w, are the smallest and the 
greatest eigenvalue of X’X, respectively. Further, we need an upper bound 


2.3. Properties of M-estimators 151 


k, for the squared norm of the Hessian matrix H = ts O5Co, which is computed 
by ee 


a Dp 1/2 
6/4, => (2-4) : 
j=l 
The description of the algorithm is then the following: 
(1) 
(2) see Algorithm H 
(3) 


(4) Find the partition (J_, Jy, J,) with respect to (B™, o™). 
(5) Compute the vector 


wm — up, om) aa (O50) [Coy ae ko™*)5] ade pm 


with 
a= ay, — DK. 
46d i 
(6) Tf jw| < ecm Ve, for all j =1,...,p and |o™) — otm| < egtm+h, 


estimate B by B™ + w™ and o by o™” and stop. 
(7) If fh) = p™ + w™ and o™ yield the same partition as in (4), put 


pm) — fm) and go to (8); 
otherwise compute 
ford — Bom + my 


where the relaxation factor y™ is chosen according to the following in- 
struction: 


Define 
aly) = [glB™ + yore, ofm*D) — g(B™, of™)] 


J [vy (w™ J -Vg(p™, gmt) )}-1 


with 


yl, of) = [Chy + homs — fC yp] — 
oC ) 


1 


‘ 1 
and choose 7™ as the largest element in the sequence 1, pete Eas 


for which «(y™) > C, where0 << (<1. 


Go to (9). 
(8) Lf jo™) — o™| < eo(™, estimate 6 by BO and ao by o(”*, and stop 
the procedure. 


152 Chapter 2. Robust statistical inference in linear models 


| 12 
(9) Compute o(™*?) — om with 


2 


= Ziyi —# — %; (CoCo) * Coy, 
br = BE [ol (C600) (Be, — Da.) + # (E14 D1) — lo —a)y 
i€T g JET 4, jed- ied 5 ie T 


and 
Gers) = (C009) [Cod + hotm*2)s]. 


If the partition (J_, Jo, J,) has not changed, take (B("*», o(™*®) as an 
estimator of (6, «) and stop. 
(10) Put m := m + 1 and go to (2). 


The computation of the matrix (CjC))"! may cause some difficulties because 
O5Cy may be singular, as discussed in Dutter (1975a). 

Algorithm S is preferable if a high accuracy is needed; Algorithms H and W 
are much simpler to code and a single iteration can be performed simpler and 
faster. The algorithms and their variants are described in detail in Huber 
and Dutter (1974) and Dutter (1975a, b); the latter work provides a list of the 
most important programs and some programs in the form of subroutines. 

Dutter (1975b) compares the algorithms and their variants from the point 
of views of computational times and numbers of iterations. 


2.3.4 Asymptotic properties of M-estimators 


The asymptotic theory in this section considers the case that n — co and p 
fixed. For some results concerning the case the number p of parameters is 
allowed to increase with the number n of observations, we refer to Huber 
(1973). 

The asymptotic normality of M-estimators of regression parameters has been 
proved under various regularity conditions. We shall prove one of the results 
in subsection 2.3.4.1. 

Subsection 2.3.4.2 will be devoted to the asymptotic minimax property of 
M-estimators (and simultaneously that of the relative L- and R-estimators) in a 
neighbourhood of a given unimodal distribution, e.g. the normal distribution. 


2.3.4.1 Asymptotic normality of M-estimators 


For n = 1, 2,..., let us consider the model (2.2.1) under the following system 
of assumptions: 


(A1) f(z) = F'(x) exists, is absolutely continuous, and has finite Fisher’s 
information, i.e. 


Ey | eal dF (x) < oo. (31) 


2.3. Propertics fh estimators 153 


Put 
Pl) =intle:Pe)SH O<t<1. (32) 


(A2) X, = (oP i up 8 & given (nXp) matrix with the columns z ,, 
j=1,...,p and the rows 4,1 =1,...,0 satisfying the following con- 
ditions (omitting the superscript n): 

0) y=tj+df 1sisn, isjsp 


a 


(b) The vectors 2 = (Hy,...,%4;/, j=1,-.., p, satisty 

why A=eM j=1,..,9, (33) 
where the salar product in (33) is either 0 for al) but a finite number of 
n, positive for a\\ but a finite number of n; if it is positive, then 


lira max (xf)? [z a =0  (Noether’s condition); (34) 
cad | 


twits Amie 


M > 0 is 2 constant independent of n. 
Analogous conditions are to be satisfied by the vectors 27,7 = 1,...,p. 


(c) All pairs j,h = 1,...,p andi,k =1,...,n satisfy 
(4 — th) (hy, — Hes) 20 
(4 — Hy)  — 2g) <0 (35) 
(it — “f) (ot — aif) 20. 
(d) lin W, = W = (wy) %, existe and is a positive definite matrix, 
GrD 
where 
i 
W,= —i4,4,: 
W 
(A3) Let yz), ¢ ¢ BR? be 2 nonconstant nondecreasing function such that 


fre dF(z) < ow. 
2B 


Remark 2.3.3. The assumption (35) of concordance and discordance of the 
vectors £4 a L4,j =1,.---,p, mneans a restriction for the design matrix X,. 
However, it is satisfied on many models, eg., for the polynomial regression 
with 

jas 


Lig = 0, — 


8 


154 Chapter 2. Robust statistical inference in linear models 


In some situations, the validity of the assumption can be achieved by an appro- 
priate design of experiments. 
Let us denote 


o = —f y(z) f(a) de (36) 
JR} 
and 


= f y(x) dF (x) — ( f vz) ee = varp y(X). (37) 
2 


Let £“ be the M-estimator of 6 generated by the function y, i.e. the solution 
of (2.9). Then we have the following result. 


Theorem 2.3.2 Suppose that the assumptions (Al), (AZ), (A3) are satisfied 


for n = 1,2,.... Then n3!2(B™ — B) has an asymptotic p-dimensional normal 
distribution 
N,(0, (x?/@*) W-). (38) 


Proof. Denote by 
n p 
My(B) = & wiyp (v ee abs), Pa A; --P 
$=1 k=1 


the right-hand side of the jth equation of (2.2.9). We shall approximate M (6) 
by a linear function of 6 in the sense of convergence in probability. This will 
be done in the following theorem. 


Theorem 2.3.3 Under the assumptions (Al), (A2), (A3), 
max [n-¥2| M(B) — M(B) + now (8 — p%) “> 0, 
{p :n4/2||B™ — B°|| SK} 
(39) 
as n—> oo, 
for any K > 0, ¢ > 0 and any fixed B° € R?; w,; is the jth column of W. 


Proof. We may suppose without a loss of generality that B° = 0. We shall 
first prove (39) for any fixed sequence {8},,.. such that 1/29 — 4 € IR? 
for n = no, ||A|| S K. For convenience, denote 


Oy pe oe gio) (40) 
and 
Pp 
Ny”( (A) = Dduy (yi - aE didn) = n—U2VEn) (Bim) , ae oer 
k=1 
(41) 


For a fixed h, 1=h Sp, let_A, = {6:6, = 0 fork h,k = 1,..., p}. Let 
us fix j, 1 <7 < p. We shall first prove a lemma. 


2.3. Properties of M-estimators 155 
eo EES AN EIS Se RES LED Ce cd 


Lemma 2.3.1 If 6 € A), then N}”(B) has an asymptotic normal distribution 
N(—A),@wp;, t?w,;;) , asn—>co; f=1,...,p. 

Proof of Lemma 2.3.1. For convenience, denote 
d= 4d), 0, = Cr ise ed Reid Pinna ta Bente 


Let us introduce the likelihood ratio 


On account of (A2) and of [A 2.5], the densities [] f(z; + d;) are contiguous 


n w=1 
with respect to the densities [| /(x;) (see also [A 2.2]). Moreover, [A 2.5] implies 
that —<- 


Py How ta — + > AIF) wm 


zo, as 2 —> Co (42) 


where 


1 
The central limit theorem implies that the pair (15%, T, — af Al(F) vn 
is, as n > oo, asymptotically jomtly normal with the parameters 
1 
Wy = 0, Me = —— AL(F) wrp 
: (43) 
o; = TW; 5 oR => Awl (F), O12 = Awwy;. 
It then follows from (41) that (Nj”(0), log Z,) has the same asymptotic di- 
stribution. The asymptotic normality of Nj”(A) then follows from the third 
LeCam lemma (see [A 2.4]). 


Lemma 2.3.2 Under the above assumptions, 


lim max Po{|N}"(4) — N{(0) + A,ow,;| 2 e} = 0 (44) 


n—oo 15jSp 
for any ¢ > O and any fixed A € Ay, 
Proof. Let us keep the notation of the preceding proof. Furthermore, denote 


f= wie) Vo bat. 
Then 


1 
lim if [é(t) — E(t)? dt = 0 


n—co 0 


156 Chapter 2. Robust statistical inference in linear models 


where 
é = if O<t<— 
m 
&(™)(¢) E(t) if pear 
m m 
‘cos if ees eet 
m m 
TOL Este Otecs 
y™t) = EF), te R 
and 
N™(A) = Ldiv\Yni — Adi), m = 2,8, .-2. 
Then 
n 1 
ELN}(0) — Nf-™O)P S Da? f (£ — EOP dt Se (45) 
s—1 0 


for m > my uniformly in n = 1, 2,.... 


(45), the contiguity of [| f(z; + 4d;) with respect to [] f(z;), and [A 2.6] 
then imply that Ee =i 


max |W{(4) — N{""™(A)| 240, as n> 00. (46) 
1SjSp 
Furthermore, 
Var [N™™(4) — N%™(0)] < Sa di [ [ye —Ad;) — y™(a)P 4F(e). 
2 (47) 


The last tegral tends to zero for m fixed and as n —> co according to Lebesgue’s 

dominated convergence theorem, for y™, being bounded and nondecreasing, 

has at most countably many discontinuities. On account of the Chebyshev 

inequality this implies that, given any « > 0 and any fixed m = 2,3,..., 
lim max P,{jN{"™(4) — N¥"™(0) — EN}*™(A)| =e} = 0. (48) 
n—>0o 1S5j75p 

According to Lemma 2.3.1, Nj" (A) isasymptotically normal N(—o™ B,w,;, 
TmW;;), and Nj"™ (0) is asymptotically normal N(0, 7?,w;;) with 


o™ = —f l(a) f'(x) de 
and 
Tm = f (ye)? AF (x) — (f p(x) dF (ay), 


2.3. Properties of M-estimators 157 


thus, on account of (48) and of lim w'™ = w, we get 


m—->oco 


lim max Po{|N{™™(4) — N%™(0) + A,o™,;| = e} = 0. (49) 


noo 15j]S5p 


The result then follows from (45), (46), (47), and (49). 


Completion of the proof of Theorem 2.3.3 Let us introduce the statistics 
NG(A*, a) = Dai (y. — E att + aeeayy) 
E s (50) 
Np(A*, A**) = Sadie (ys — ¥ (att + astay)) 
and 
I AA* VAST ee NAT, AY teas (Ae, AS). (ai ae ye 


For a fixed j, let x';2,; > 0 for all but a finite number of n. Regarding (35), 
[A 3.8] implies that N¥ is nonincreasing in 4f,..., 45 and nondecreasing in 
At*,..«.¢A5 5 while N}* is nondecreasing in Af, ..., 4, and nonincreasing in 
Ay*, ..., A5* (see also an analogous proof of Theorem 2.1 of Jureckovd 1969). 
Lemma 2.3.2 and the contiguities mentioned above entail that for arbitrary 


fixed A*, A**, 


lim P, | N,(A*, 4**) — N;(0) 
+ oS [AMG a + amar ay] = ‘| 2 (51) 
h=1 


Let Q = {6, ..., 6}, where —K = 0° <...< 6% = K, be a partition of 
[—K, K] such that 


|o(d) — 6@-D)| < c(QpMV?)-1, k=1,...,7. (52) 


Denoting I = {x:||x|| < K}, we get from (52) and from the monotonicity of NF 
that 


p 
max |N*(A*, A**) — N#(0) + o > [ARG4) dh, + A (GY C4") 
A*,A**EL heart 
z * *\/ * KE) TR 
< max Nj (A*, A**) — NF(0) + @ DAG (a) a’, + An (dy) Te t+ €, 
A*, A**eD h=1 


where D = {4 = (A, ..., 4p)’; 4; € Q, & = 1, .--, B}- An analogous proposition 
is valid for N¥*. Thus, we get from (51), (52), and from the analogous inequa- 
lity for N7* that 


lim Pp» 3 max 
n—>00 |4||sx 


£00 j= 1,.. 5): 


p 
NA) — NYO) + wD Aw; 
k=1 


= ‘| =) (53) 


158 Chapter 2. Robust statistical inference in linear models 
Finally, we have for each fixed n € IN and7 = 1,..., p 


max ‘a 1/2 


Pp 
M(B) es, M‘"(0) ae men 3 BE ey 


n3!?||B |< kK 
n D n 

= max |S ai] (va mS daft) —viond) | tom? Sper 

n3!2||g||< K |i=1 k=1 k=1 
p 

< max |N)"(4) — NYO) + wD) Ape], (54) 

4K k=1 
which, in connection with (53), completes the proof of Theorem 2.3.3. | 


Theorem 2.3.3 has an easy corollary: 


Corollary 2.3.1 Let {Brow be a sequence of random vectors from IR? such 
that {n¥!2(B™ — B} cq is bounded in probability. Then, under the assumptions 
(Al), (A2), (A3), tt holds for any « > 0 that 


—1/2 Me Bm M(B . 4(n) 0 Pe 0 5B 
n pulps) ad UB) = no 2 (Be — By) We) —> 0, (55) 
as NM > Ww. 


Completion of the proof of Theorem 2.3.2. The asymptotic distribution of 
nl2(3( — 6°) could be proved by means of the above corollary if we knew 
that this sequence is bounded in probability. The following lemma shows that 
this is the case. 


Lemma 2.3.3 Under the assumptions (Al), (A2), (A3), to any ¢ > 0 cor- 
respond K > 0,» > O and no € N such that 


Poof main IMB | <a} <e (56) 


n/?|| Bo) — B9||> K 


for n > n, where 
M(B) — (MOGI), -__ MOE)". 


Proof. Again we may put f° = 0, First, the sequence {n~/?M\"(0)} is bounded 
in probability for 7 = 1, ..., p, since it has a nondegenerate asymptotic normal 
distribution; thus, there exists an %) € IN and a Ky > 0 such that 


1 
Po{n-"2||M(0)|| > Ko} < ae for n> NM. (57) 
Let K, 7 be any pair of numbers satisfying 
| 1 
K > 2Ko/(Aoo); = > oe 


where A, is the minimal eigenvalue of W. Then 


2.3. Properties of M-estimators 159 


Theorem 2.3.3 and (57) yield for n > 1, 


Po min 3 6,MIp) < ml Ly (58) 
n1/?||B\|=K j=1 


with 79 = Ky. The left-hand side of (58) is less than or equal to 


Pal min 3° A,M") <m, min Sf, [ro — nw Au 


mr|ip||=K j=1 nil"\B||=K j=1 


oo 2neh aS Po min x B; no) — nw 2 fa << 2reh 


ntl?|[B||=K j=1 


Pp Pp 
<P, | ex 338, sige ~ MY""8) — no ¥ pen > ml 
ni/?||p||=K-j7=1 | k=1 


n?|pB|=K j=1 


<P} max y nk 
*\\B||=K j=1 


M\(B) — Ms”(0) + nw > BW; 
k=1 


“| 


+ Po{—Kn-¥2||M(0)|| < 2 — K2Aqw} > 0 aS n> OO. 


Let p* ie a point with n1/2|6*|| = K. Put 2f = => “ub, t = 1,...,0; then 
j= 
2's) ciy(Yni — T2}) iS nonincreasing in T, so hat for yre= its 
t=1 
P p 
Y (—BF) Mi(76*) = —M(r) = —M(1) = & (—6F) Mj"(6*). (59) 
j=1 


j=1 


Now, if n1/2||6|| > K, then £ = tf*, where B* = n~/?K6/||A|| so that nil2|1B*|| 
= K andr = n¥/2|8|\/K > 1. (58) and (59) then imply that 


Pof min n-¥)MVp)|| < o} 


ntl?||B\|= K 


< Po min > (—f,) M(B) mK IB < nk 


miip||SK j=1 


AN 


Re min y (—B*) M}”(B*) = nl Ze for nS ne. 


ml|B*||=K j=1 
Finally, regarding that M\"(A{?) = 0, 7 = 1,...,p, we get from Lemma 
2.3.3 and from Corollary 2.3.1 that 


nl2(pir) — Bo) — pay n-V2W-1M(po) | 2+ 0 (60) 
w 


160 Chapter 2. Robust statistical inference in linear models 


under f° so that n1/2(8% — 6°) has the same asymptotic distribution as 
eae es W-1M (6°). The asymptotic distribution of the latter sequence 
o 


could be easily found as that given in (38) because each component of M‘(°) 
is a linear combination of p(yn1), ---> Y(Ynn)- This completes the proof of Theo- 
rem 2.3.2. 4 


2.3.4.2 Asymptotic minimax properties of M-, R-, and L-estimators 


For any sequence T' = {T',} of estimators of 6 in the model (2.2.1), let D(T,, F’) 
denote the asymptotic variance of (2'W-1/)-/? A’nV2(T, —B) under F; 


p 
A = (A, ..-, dp)’ is any vector with >) A? > 0. 
j=l 
Considering the model (2.2.1), we may distinguish two situations: 

(i) F is known and smooth. Then we may determine an asymptotically effi- 

cient estimator (e.g., the maximum likelihood estimator and the optimal 

R- and L-estimators, respectively). 
(ii) F is only aproximately known, e.g. it is known to belong to a convex com- 

pact neighbourhood ¥F of a given distribution G. 


Let Fp be the distribution in F which has the smallest Fisher information, 
I(Fo) = inf I(F). Then, for any sequence T of estimators, D(T’, Fy) will be at 
FeF 


€ 
best equal to 1/Z(#'y); our aim is to find a Ty) such that D(T, F) does not ex- 
ceed 1/I(Fo) for any F € J, i.e. Ty which is asymptotically minimax in the 
sense that the inequalities 


D(T, F) S D(T, Fo) S D(T, Fo) 


hold for any F' € ¥ and any asymptotically normally distributed-sequence T 
of estimators of f. 

This section presents an explicit solution of this porblem for the model of 
é-contamination. The following theorem shows that there exists an M-estima- 
tor which is asymptotically minimax for this model. However, due to the 
correspondence between the M-, L-, and R-estimators (see (2.2.17) or Section 
2.6), it immediately implies that the classes of L- and R-estimators also 
contain asymptotically minimax elements. Each of these minimax estimators 
will be asymptotically efficient for the least favourable distribution Fo. 
Theorem 2.3.4 (Huber, 1969). Let 


F = {FF = (1—e)G+ eH | He M (61) 


be a system of distributions with e € [0, 1) being a fixed number; G is a fixed 
symmetric absolutely continuous distribution function such that I(G) < co and 
that its density g is twice continuously differentiable and (—log g) is convex; M 


2.3. Properties of M-estimators 161 


is a family of symmetric substochastic measures on IR}, i.e. for each H € M we 
have H(B) S 1 for any Bé B'. Let F, — F be a convex subset of F such that, 
for any F € Fy, either of the following three conditions holds: 


(i) IF) < 00; ae 
(ii) f 2 (1 —e)g, where f = —-; 
da 
(iii) f f du, = 1, te. f is the density of a probability distribution. 

Then there exists a unique Fy € F, such that 


I(F) = inf I(F) (62) 


FEF, 


and, if Ty denotes the maximum likelihood estimator corresponding to Fo, then 
D(T», F) <= DT), F) = DT, Fo) (63) 


for any F € F and for any asymptotically normally distributed and asymptoti- 
cally unbiased estimator T of B. 


Proof. For F € F¥,, we may write 


I(F) = I*(F) (64) 
where 
Vid sup (f y’(x) dF (a))? (f p(x) dF(x))+ (65) 


and @ is the set of function y continuously differentiable on a compact support 
and such that qi ye dF > 0. 
Actually, using the Schwarz inequality, we get 


T(F) = sup (f vf’ da)? (fy? dF)? < I(F). 


To prove the opposite inequality, suppose that I*(#’) < oo. Denoting A:€ — IR! 
the linear functional defined by Ay = f y’ dF, we have 


Ay 
Ale =f — ([*(F))2, 
||A]| = sup ivi (1*(F)) 


lly? = fy dF. 


A, being bounded, extends to all L?(#) by continuity. Thus, there exists a 
function h € L?(F’) such that 


Ap = fyh dF. 


where 


Zz 
Put f(z) = —fh(y) dF(y), « € R1. Then f is the density of F’, is absolutely 
continuous a at =h¢ L*(F). Indeed, it follows from the Fubini 


(x) 


11 Nonlinear Regression 


162 Chapter 2. Robust statistical inference in linear models 


theorem that 
fy ak = f ply) Ay) AF) = —f f vw) hy) dFly) de 
y¥<x 
= ‘i y' (x) f(x) da forany ywé€6é. 
Hence, we may minimize I*(F) as well as I(F); since [*(F) is convex in F 


(see [A 3.9]), it suffices to find a local minimum. As we shall see, the criterion 
for (fF) attaining the minimum at Fy € F, has the form: 


1/2\"" 
(° at ) ( — fe) de = 0 (66) 
0 


dF : 
forall Fy S35 f= 1 1 and for an arbitrary constant c. To prove this, 
2 
notice that J(F’) attains its minimum at F% if and only if 


a | >0 for all Fy € Fy 
dt t=0 


where 


= 1 9| = Soin te — , UF Y) — I(Fo)] 
t=0 


dt t>0 


and F, = (1 —t) Fy) + tF,, O<St <1. Supposing that fo/fo is absolutely 
continuous, we may find by direct computation that 


d 

ee Hee } = —4 f ((A?)"/8) (hh — fo) dz 2 0 
t=0 

and this implies (66) due to the fact that i (7; — fo) ‘~ = 0. Now, for any pone 

ax of the set {x: fo(x) = (1 — e) g(x)}, we have f,(z) = fo(x) and (f)?)'"/f? = 

ee the symmetry and the pame of is If x belongs to the aH 


x: fo(x) > (1 — «) g(x)}, then f,(x) — f(z) can take on positive as well as 
Se values; suggests js ea : such that (fi/?)"/f? = ¢ for some 
constant ¢, i.e., fo(z) = a e~*!#! on this domain. 

Let 


g(x) 
g(x) 


Lg = sup {eR : 


“| 


where k is determined by the condition 


2((1 — «)/k) ote +k fate) ar| a8 461) 


Fey Properties of M-estimators 163 


Put 
(1 — e) g(x) if OS '4'< a 
fo(a) = 3 (1 — €) g(a) e *@-™) if gS 2 (68) 
fo( —2) Tac OF 


We shall show that the maximum likelihood estimator Ty corresponding to 
fo is asymptotically minimax for ¥: put 
yo(%) = —folx)/fo(), « € IRI; 
then 
k sign x it |x). 5 
Yo(%) = (69) 
—9'(x)/g(z) if a] > a 


Then y, = 0 (because (—log g) is convex) and D(T), F) is given by Theorem 
2.3.2; namely, 


— 


D(T,F)) 


I 


[1 —e) fyodG + e fy dH]}?- [1 —e) [yp dG + ef yj dH} 
= [(1 — e) fyo dG]? - [(1 — ¢) fyg dG + ek] = (D(To, Fy). 


On the other hand, the inequality D(T), Fy) < D(T, Fo) follows from the 
Rao-Cramér inequality for any asymptotically normally distributed and 
asymptotically unbiased estimator T (ie., D(T, Fo) = 1/I(Fo) = D(To, Fo) 
under some regularity conditions, which are fulfilled in our case). B 


Remark 2.3.4 Let Fo and yo be given by (68) and (69), respectively; put 


Polt) = po( Fo *(t)) 
and. 
Jolt) = yo Fo (6) ( f pola) AFo(x)\, Car =1. 
Rt 


Then the R- and L-estimators corresponding to yp and Jo, respectively, are 
asymptotically equivalent to TM’) and hence also have the minimax property 
(63). 

Example 2.3.1  «-contaminated normal distribution: 


Put G = @ in (61). The asymptotically minimax estimator is then either the 
M-estimator generated by yp defined in (4) with & of (67), or the R-estimator 


iit 


164 Chapter 2. Robust statistical inference in linear models 


with the score-generating function 


k if 1>#2 (1 — 6) Ok) + > 
— (e/2 1 
pot) =f or ("Pip Src gam +s 
1—e 2 
ral ae) jie OL 
or the L-estimator corresponding to the weight function 
0 if 1>42(1—6) Ok) + > 
= 1 
Jott) [2(1 — e) (O(k) — 1) |tper at os <t< (1 —¢) O(k)+ a 
Jo(1 — t) i Or 1/2. 


2.4 Some properties of rank tests 


The R-estimators are derived from the rank tests, so that their properties will 
follow from the properties of the tests on which they are based. Thus, we shall 
deal first with rank tests. 

The main feature of rank-based methods which caused their great popu- 
larity is the weak set of assumptions required for their validity. The null 
distribution of the rank-test statistics and the significance level of the tests 
are independent of the basic distribution F and thus are exactly known. It 
is for this reason that the rank tests are frequently called. distribution-free 
or nonparametric, i.e. free of the assumption that # belongs to some specified 
parametric family of distributions. 

First, we shall define the basic entities which will appear throughout the 
present section. Let y® denote the ith smallest coordinate in the vector 


Y == (Yi 5-55 Ya) 80 woav 
yOs <= av =o <= y, (1) 


If (y,, ---, Yn)’ i8 a random vector then the statistic y® is called the ith order 
statistic, and the vector (y™, ..., y”) of order statistics is denoted y. 

Let y = (y,---, Yn) be a vector such that no two coordinates coincide; 
denote by 7;(y) the number of y’s which are < y;, i.e. the rank of y; in the se- 
quence (1): 


YY, Od ad eee oe (2) 


2.4. Some properties of rank tests 165 


The statistic R; = r;(y) is called the rank of y;; let R = (Rj, ..., R,). We may 
alternatively write 


n 


R, = Duy — ¥;); (3) 
jes 
where u(y) = 1 if y = 0 and u(y) = 0 otherwise. 

The ranks are defined unambiguously only if there are no ties among the 
observations, i.e. if no two observations coincide; tied observations require 
special treatment. The probability of coincidence of any pair of coordinates 
equals 0 if the distribution function of y is continuous. Let 2 denote the space 
of all permutations r = (ry,...,7,) of (1,...,2); obviously R contains n! 
points. 

We say that random vector y = (y,,...,Y,) satisfies the hypothesis H, 
of randomness if it is distributed according to the density 


Ply, «+5 24) = TT fle) (4) 


where f(x) is an arbitrary one-dimensional density; i.e. if the components y; 
are independent and identically distributed according to some density f. 

For instance, if we consider the regression model, the hypothesis Hy means 
that the regression part vanishes. 

We say that the random vector y = (Yj, ..., Yn) satisfies the hypothesis H, 
of symmetry if it is distributed according to the density (4) with f(x) being 
any onedimensional symmetric density (f(a) = f(—x),r€ RR’). In other words, 
the hypothesis H, is true if the component y; are independent identically 
distributed according to a symmetric density. Obviously, H, implies Ho. 

Let us consider the statistics: 


sign y = (sign yj, ..., sign Y,) (the sign statistics) 
loy| == (ly |, ---> [Yal) (absolute values of observations) 
ly = (ly|®, ..., ly|™) (the order statistics for absolute 
values) 
Rie (lee Bee) (the ranks of absolute values) 
where Ry = > ull — |y;|), heh, Ps. 
pe 


Let = weR®*|7;=— Lory, = —1,1— 1,:.., n}. 

Throughout this section, we shall consider the rank tests of Hy and H, 
against the alternatives of regression in location. We shall formulate the rank 
tests which maximize the local power against these alternatives and prove 
some asymptotic properties of the tests as n —> oo; this will be a starting point 
for investigating the asymptotic properties of R-estimates. 


166 ‘Chapter 2. Robust statistical inference in linear models 


2.4.1 Locally most powerful rank tests 


The vector R of ranks of (y;,...,Y,) is the maximal invariant for the problem 
of testing Hy against some rich sets of alternatives, under the group § of trans- 
formations § = O: IR” > R’, g’(x) = (9(21), Pe gteall ts where g runs through 
the set of all continuous strictly increasing functions IR! — IR}. For instance, 
it is the case for alternatives consisting in that the vector of observations de- 
composes in two random samples with different distributions. Unfortunata- 
tely, there is no uniformly most powerful (UMP) rank test for this problem, 
and thus neither is there any UMP invariant test. Thus, we shall restrict the 
set of alternatives and look for a test which is most powerful locally against 
a subset of alternatives near to the hypothesis among all rank tests. The 
situation is analogous for H,, with the only difference that the maximal in- 
variant is here the vector (sign X, R*) and the corresponding group of trans- 
formations is § = {g': IR" R", g(x) = (9(#1), ---, g(@n))}, g runs through the 
set of all continuous, strictly increasing and odd functions IR! > R1?. 

We shall then look for a signed-rank test which is locally most powerful for 
H, against a specified set of alternatives. 

In the case of Ho, a test is called rank test if its test function ¢ is a function 
of R only, é = é(R). The critical region of non randomized rank tests is a union 
of some of the following events: 


Bie eee eX eR 7 | (5) 


1 
Under Ho, we have P(R = r) = P(4,) = Saas A. Let us consider a simple 
n! 


alternative stating that y has a probability distribution Q and denote Q(R = r) 
= Q({y: R = r}). The Neyman-Pearson Lemma (see [A 3.3] of Bunke and 
Bunke, 1986) then immediately gives the most powerful rank test of Hy against 
a simple alternative Q. 


Lemma 2.4.1 The most powerful test of Hy against a simple alternative Q is 
given by 
1 if OR=r)>k 
(6) 
0 if QR=r)<k 
where k satisfies HE(R) = « under Ho. 


In practice, however, the exact evaluation of &(r) is rarely possible because 
Q(R = 1r) is difficult to compute. Then we may try to find the locally most 
powerful rank tests. 


Definition 2.4.1 Consider an indexed set of n-dimensional densities {q,, A > 0} 
and assume that the random vector X with density qo satisfies the hypothesis H. 
A test is called a locally most powerful (LMP) «-test for H against A > 0, if 


2.4. Some properties of rank tests 167 


there exists an ¢ > 0 such that the test is uniformly most powerful at level « for 
A = 0 against XH, = {¢4:0< A < 3}. 

If A also may be negative, we shall call a test locally most powerful for A = 0 
against A + 0 if it is uniformly most powerful for H against K, = {q4:0 < |A| 
< ¢} for some « > 0. 


The uniformly most powerful test is also locally most powerful. On the other 
hand, even if the locally most powerful test is not uniformly most powerful, 
its power function increases as rapidly as possible for an «-test in a neigh- 
bourhood of the hypothesis. } 


Theorem 2.4.1 Consider a system of regression alternatives y; = Ac; + «&, 
4 =1,...,n, such that the density of (Yy;, .--, Yn) ts an element of 


{as = Tite (y; — Ac;) ):4>o (7) 
where f is a known density which is absolutely continuous and such that 

Rife) isa | ete 
and Cy, ..., Cy, are known constants. Then the test with critical region 

n 

» CiaAR;, f) 2k (9) 

i=1 
is LMPR «-test (locally most powerful rank «-test) for Hy against (7) where 


x= P{> aR f =H, (10) 


t=1 


where P is any probability distribution satisfying Hy and the scores a,(t) corres- 
pond to f in the following way: 


a,(t, f) = Eo(U™, f), Weal eh; (11) 
where 
f(F-1() 
i, f) = -—————,, Osa H= 1 (12) 
v(t, f) (F0) 


and UY <... << U™ is an ordered sample from the wniform distribution on 
(0, 1). 


Proof. We shall prove that for any fixed r € 2 


lim = [wl QR =r) — 1] = Yoalr (13) 


40 t=1 


168 Chapter 2. Robust statistical inference in linear models 


(where Q, is the probability distribution corresponding to q,), which implies 
that there exists an ¢ > 0 such that for 0 < A < « and for any pair7,r’ ¢ R 
satisfying 


n 
» CA, (7 a) = > Cin (Ti, f) f) 
c—1 


+= 1591 


it holds that 
Q,(R = 7) > Q,(R =1'). 


Indeed, we have 


Q4(R =r) = fo J Galas 9 Ye dp, 


1 ag mu 
== 443 le sla Fy. — ded) — fev) TT fay — 46) Tse) a 
where we have used the identity Il A; — 112 =x A; — Bj) TT A; TI Bi 


t=1 j=1 j=l =t4+1 


As 4->0, the integrands of the last integral tend to (—eFw) 7 itu), 
j+i 


t = 1,...,. Moreover, considering first 4c; > 0, we get in view of (8), 


fim ~ fai payne DTT fe , — Ac,) TL #(n) pt 


pe #+1 
=m [ = ity: — 40) )—ro0 =F f fra J} du, a0 


= = el fire | dye, (14) 


and similarly for 4c; < 0; this in connection with Fatou’s lemma (cf. [A 2.17]) 
implies 


lim ee abc md I f(Yn) Aun 


A->0 i=1 k=i+1 
tka nee L(y) 
= x ff: Hy) 4 Hua 
is 
1 Yj 1 2 
ear Se caer = ss Pees 2 Cialis at) 


which entails (13). This, on account of Lemma 2.4.3, completes the proof. ea 


Theorem 2.4.2 Consider the set of alternative (7) where f is a symmetric density 
satisfying (8) and ¢,,...,¢€, are known constants. Then the test with the critical 


2.4. Some properties of rank tests 169 
region 


» o sign Xa; (Ri, f) =k (15) 


i=1 


is the locally most powerful signed-rank test for H, against {q4: A > 0} at the level 
oe P| So; sign Xat(Ry, f) = x (16) 
i=1 


where P is any probability distribution satisfying H, and the scores a; (i, f) cor- 
respond to the density f in the following way: 


ease ) 
; Y) 


as (i, f) = Ep eres Se fe! Sy (17) 


where y(t, f) and UY) <... < U™ are the same as in Theorem 2.4.1. 
Proof. The theorem follows from the equality 


yet) 
lim — [2°!Q, (sign y = v, R* = r) — 1] 
40 A 


n 
= ye CVA; (Ti, Ls VE We in te R (18) 


t=1 
which may be proved in the same way as(13). 


We shall call the statistics 


n 
S == ye c,a(R;) (19) 
i=1 
and 
n 
S+ = >¥¢;, sign y,a,(R; ), (20) 
i=1 


the simple linear rank statistic and the simple linear signed-rank statistic, 
respectively. 


Definition 2.4.2 Let B(«, H,,Q,) denote the power of the most powerful «-test 
of H, against the simple alternative Q,, v = 1, 2,.... Then an a-test &, is called 
asymptotically most powerful for testing H, against Q, at level x if 


lim [Bla, ta be Q,) —fé dQ, | ==); 


Lemmas 2.4.1 and 2.4.2 show that the distribution of S (of S*) is the same 
for all distributions of observations satisfying H,(H,). It is not the case under 
the alternatives, so that the power of the tests is not ‘distribution-free’. 

An advanced account of the theory of rank tests as well as a survey of selec- 
ted rank tests may be found in Hdjek and Siddk (1967). 


‘ 


170 Chapter 2. Robust statistical inference in linear models 


2.4.2 Asymptotic behaviour of rank and signed-rank test statistics 


Two asymptotic properties of the simple linear rank and signed-rank statistics, 
which will be proved in the present section, are of basic importance for the 
asymptotic theory of rank tests and estimates. The remaining text of Chapter 2 
will be, in fact, their consequence. 

First, as the number x of observations becomes large and some regularity 
conditions are satisfied, the null distribution of the rank statistics tends to the 
normal distribution. 

Second, considering S (or S*) under the regression alternatives (7) as a 
function of A, we shall see that this function is asymptotically of the form 
Sy + 4b, as n> oo with respect to the convergence in probability. This 
property is analogous to the property of M,(f) described in Theorem 2.3.3. 

For n = 1, 2,..., let Yni,...; Yan be random variables and R,; the rank of 
Yni 1 (Ynis +++» Ynn)» Consider the simple linear rank statistics 


S, = Li %niai(R,i), Sr = DL Vni@n(Rni) (21) 
t=1 1 
under the following assumptions: 


(A4) The scores a,(7), a7(2) are generated by a function g(t), 0 <t < 1, which 
is nonconstant, nondecreasing and square integrable on (0, 1), in the 
following way: 


a,(t) = Eo(U), i=1,...,0 (22) 


where UY < ... << U™ is an ordered sample from a uniform distri- 
bution on (0, 1); 


) 
a*(i) = » |——],, Be) Pana 23 
(()=¢9 (- = ) a n (23) 
(A5) The regression constants %p1, ..., Upp Satisfy 
n 1 n 
2 (tai — 2,)° > 0, 2, = — Di) taj (24) 
i=1 nN i=1 
and 
n 1 
lim max (%,; — %,)* |& (2p — 2.) =O; (25) 
n—>0o 1Si<n ja 


(Condition (25) (Noether’s condition) guarantees the uniform asymptotic 
negligibility of the summands in (19).) 


Theorem 2.4.3 (Hajek 1961). Let Ym,---,Ynn satisfy Hy for n = 1,2,... 
and the assumptions (A4) and (A5) be fulfilled. Then, as n oo, the statistics 


/ 


2.4. Some properties of rank tests 171 


(19) are asymptotically normal N(ES,, 0?) with 


op = Li (emi — Fa)? { (ol) —gPdt, = f git) ae. 2) 


a7=1 0 0 


Proof. Let &p1,-.-,2nn be the random variables distributed according to (4). 
n 

(i) Consider the scores (22) at first. Denote C? := )'(x,; — %,)?. Consider the 

statistics et 


Il 


Ty = Yi (tui Fx) (FP Yad)) +Fe Sali), = 1,2... QM) 


t=1 


Thew S, = £(T, | A,), n= 1, 2,-.. where 2, = B(Rqy, -.., Ryn) 18 the a- 
field generated by Ry, ..., Ry»; it follows from properties of conditional ex- 
pectation and from [A 3.12] that 


E(T, — S,)? = Var T, — VarS, 
1 


= C ( (p(t) — p)? dé — 


0 


n 
as 2 (ale) G,)? 


= ~ 2 Var 9 F(y,)) = o(C2), (28) 


n 


1 n 
where @, = — )ia,(t). 
n i=1 


(28) implies that o,'(S, — HS,) has the same asymptotic distribution as 
o, (1, — ET,), where ES, = ET, =Z, >) 4,(t); the latter distribution is 
i=1 
N(0, 1), as follows from Lindeberg-Feller Theorem (see Bunke and Bunke, 
1986, [A 4.21]). 
(ii) The proof is more tedious for S¥: we need the following lemma. 


Lemma 2.4.2 Consider two sequences of functions 


28 1 
Pnlt) = An(?) : hoe aay ene 
n n 
(29) 
— 1 
Pr(t) = a7 (2) : te ea ,n 
n nN 
Then 
1 
lim f (pn(t) — o(t))? dt = 0 (30) 


n—>co 0 


172 Chapter 2. Robust statistical inference in linear models 


and 
lim | (pr(t) — g(t)? dt = 0 (31) 


n>o 0 


Proof. First, we shall prove that 


Pr(t) > v(t) a.e. with respect to “@,1) 
pr(t) > l(t) a.e. with respect to wo,1)- 


The convergence is obvious for y*. Concerning @,, fix a fy € (0, 1) and take a 


, 4 
sequenee {i,} such that — — f. 
n 


Let 
(ieee tell t's Opa 
(t) = in — 1 
0 otherwise 
be the density of beta distribution B(i,, — 7, + 1) and let G,( =i In(U 
Then g, is unimodal with the mode secisiase to and 
n — 
0 if t< bo 
him G(¢) = (32) 
n—>0o il if tf >to. 


Let 6 > 0 be such that (t) — 6,% + 6) (0,1). Then, regarding that @ is 
bounded on (fy — 6, tg + 6), we have 


to+d 


J ©) — olto)| dG,(e) 


to—d 


\Pn(to) — o(to)| =|f [p(t) — o(to)] d@,(t)| S 


+ i, (ote) —())® ae) | f anther e 


|t—to|26 |t—to|26 


as n>ow. 


This proves (32). Moreover, 
1 
J PO dt = By*(U,) = E(9(U,) | Rn) = J Galt ed ee 
0 


and this in connection with Fatou’s lemma proves (30). 
On the other hand, the functions g(t) are uniformly integrable on (0, 1). 


Indeed, there exists a 6 > 0 for any « > 0 such that i g(t) dt < + for any 
A 


2.4. Some properties of rank tests \ 173 


A & (0, 1) such that (4) = bh dt < 6. Then 
(t)P dt = di etl (a 
J len) Eo (a)e(4o( Neel A 


A 
= if y(t) 


Br n|[(n+1) 


1 
n+1 
n 


y(t) dt <« for n > [6-4] 


1 1 
where #@, = A + ag té+ —:t€ Ap. This together with (32) proves (31). 
n 


(iii) We are now able to complete the proof for S*. We get by [A 3.12] and 
by direct computation, 


H(S, — ES, —S¢ + BS3)* = —— 02 ¥ [ax(s) —G, —a%(i) +a3P 
UD i=1 
1 


Say OE (mally — ald)? = 2 08 (vat — ettn)tae = 010% 
; (33) 


so that o,1(S* — HS*) has the same asymptotic distributionas 6, 1(S,, — HZS,). 


An analogous theorem is valid for the simple linear signed-rank statistics 
(see Hdjek and Siddk, 1967). 

The second basic property of simple linear rank statistics is their uniform 
asymptotic linearity with respect to the regression parameter. 

Suppose that Yn, ---, Yan are distributed according to (4) and that I(f) < oo. 
Let R’, be the rank of X,; + Bday in (Xm + BAm, --->Xnn + Bun), t = 1,..., 2, 
where dy, .--; Inn are given constants and 6 € #. Consider the statistics 


n 
Srp = pa Ly iAn( RF) , ne = = In fn ( (34) 


i=1 


under the assumptions (A 4), (A 5), and two following additional assumptions: 


(A 6) The constants d,, ..., Inn Satisfy 


n ¥. ya 1 2 
DS day — 0) Se, dy = — Ani» n= 1,2,... (35) 
s=1 N i=1 
where M > 0 is a constant, and 
lim Bees (dni — a4) =0, as n->oo. (36) 
n—oo |1SiSn ; 


forall je 1,--5 7 


174 Chapter 2. Robust statistical inference in linear models 


The assumption (A 7) of concordance-discordance is analogous to (2.3.35); 
see also Remark 2.3.2. 


Theorem 2.4.4 (Jureckovd, 1969). Suppose that Yn,--->Ynn are distributed 
according to (1) where f is a density with finite Fisher information. Then, under 
the assumptions (A 4)—(A 7), 


ee Sno) 72/2 max |S,g — S? — pal] ay) (37) 
BIS K 
for any « > 0, K > 0, where 
n 1 
b, s ae (pj = En) (dni a dn) J p(t) y(é, f) dt (38) 
i=1 0 


with y(t, f) defined im (12). 
Proof. We may suppose without loss of generality that )} x,; = 0, 22; = 1. 
i=1 tT 
(i) We shall first prove (37) for the scores (22) and a fixed 6 € IR}. 
Consider a sequence {p“(t)},<97 of functions on (0, 1) 


eae? (39) 


a aa 
()(¢) = i — <i 
POt) =¢ (; a i 1 i = 
and put | 

ai) = Fo®(U), i=1,.:.,n; k= 1,2,... 
Introduce the statistics 

Sy =); aja (Re), faa NN We oe (40) 

+=1 


(we are omitting the subscripts n in x,;, R,;, etc.). Then 


E[S,9 —S®P<e for n>mn(k,e) and k> ko(e). (41) 
Indeed, 
‘ 1 ae 
E[Sno — So P S —— > [a,(t) — ap 
nm — | iji=1 
1 
= [ (on(t) — o(e))? at 
n—I1 
0 
where 
grlt)=a,() if <is— 
n n 


2.4. Some properties of rank tests 175 


(41) then follows from Lemma 2.4.3. Let us introduce another system of sta- 
tistics 


ve n i ta 1 cee The OL ee 
Tis ae 2 Tn iV (F(Yni Ei B(dni ee d,))), (42) 
i=1 Nia lnterereate 
Then (28) with g replaced by gy“ implies 
lim E(S\) — T)2 — 0, eo i ae (43) 
n—>co 
Moreover, we shall prove 
lim ee ET!) — TYP —0, i 2 es (44) 


In fact, 
Var(T\) — Te) 


= Sab- f [p(#(e + (dys — F,))) — 9(Fle))f aF (2). 


1=1 


The last integral tends to 0 by [A 2.18] as n > o, for F is continuous and 
gy is bounded, nondecreasing, and thus has at most countably many dis- 
continuities. 

The sequence of densities 


{np} = {Ir fy — Bani — d,)} (45) 


neN 


is contiguous wal ee to {Pn}new (see [A 2.5] and thus it follows from (43) 


that Si) 4 7 aL (ey ya co; hence, 


so — Ti! Fe, 0 as noo for k=1,2,... (46) 
[A 2.6] together with (41) imply 

P{|Sng — S| Seh<e for n>m(k,ce) and k>k(e). (47) 
Combining (41), (43), (44), (46), and (47), we get 

P(|Snp — Sy — BT| Sh <e (48) 


for n > n2(k, e) and k > k,(e). 
According to [A 2.13], the statistics T'/) are, for k > k* and as n > ov, 
asymptotically normal N (fb), (o)?) where 


i 
Ie se f ott 
0 


t—1 


176 Chapter 2. Robust statistical inference in linear models 


and 
1 


\ i 
of*))2 = (i (p( OS p”) 2 dt, Gh) = fo) dt. 
0 ) 


Moreover, the statistics 7‘) are asymptotically normally distributed 
N(0, (o)?), so that, on acount of (44), HT") in (48) may be replaced by fb}. 
Further, the Schwarz inequality and (30) imply that 


lon? — by] = ¥ apd, — a) | to" — 91 we f) dt 


t=1 


tends to 0, as k — oo, uniformly in 7; so that we finally get 
Sank Sethe Os as cot (49) 


(ii) Consider the scores (23). The convergence (49) for S%, follows from (33), 
from the contiguity of {¢,,} with respect to {p,}, and from (49). 


(iii) It remains to prove that the convergence (49) holds not only for a fixed B, 
but also for maximum over {8 | |8| < K}.This part of the proof is analogous 
to the corresponding part of the proof of Theorem 2.3.3 concerning M,(£). 
We only need to prove that S,, is monotone in f for fixed y,, ..., y, with pro- 
bability 1. But this follows from [A 3.8] and from assumption (A 7). 


Remark 2.4.1 Theorem 2.4.4 presents the simplest version of the uniform 
asymptotic linearity of simple linear rank statistics. This property also could 
be proved under less restrictive assumptions, for the p-parameter case, etc. 
The uniform asymptotic linearity for signed-rank statistics has been proved 
by van Heden (1972). 


Theorem 2.4.5 (van Heden, 1972) For each né€N, let Yni,>:->Ynn be inde- 
pendent and identically distributed random variables with common distribution 
function F satisfying the following conditions: 

F has an absolutely continuous density f 


1 
[ 6, frat <0 
0 


(—t)=f), teR. 
Let p(t),0<t < 1 be a function such that: 


y(t) can be written as the sum of two functions y(t) and y(t) where y,(t) is non- 
decreasing and nonnegative and wo(t) ts nonincreasing and nonpositive 


1 
f yi) dt < co (= 1,2) and f ¥% dt > 0. 
0 


2.4, Some properties of rank tests 
Let Cn, 2-5 Cnn AND Any, ...5 Fyn be vectors of constants such that 


ci; > 0 


is 


oo 


-1 
lim ee (> os) | ==. 0" 


n—>oo |1Sisn j=l 
n 
x 2,< M for some M > 0 independent of n, 


lim max d?; = 0 
n—>o 1SiSn 


and, for each n = 1,2,..., either 


Caidni = O, ol See) 


(lenil — leniel) (ldni] — Idnirl) 20 = forall t,t’ =1,... 


or, 
Caidni SO, — eam 


(lenil — l¢niel) (ldnil — [dn 


Let Riv be the rank of [Yni ea dn il among [Yn y= Bdni|; Oe) Yn 


s Ri \ F 
ar 2 oni (; is ) sign (Yni ra B ni) - 


Then, 
in Sho ae BK > Cai ni 


i=1 


lim P {max |S 


n—>0o \glsec 


1 
ere Ke { way aS dt. 
0 


)20 TOV HN. 4,6 sl ie .. 


177 


— Bdyn|, and let 


= e(Var Son =O) 


Remark 2.4.2 The uniform asymptotic linearity of rank, signed-rank, and 
other statistics provides a basic tool for proving the asymptotic properties of 


tests and estimates based on these statistics, For instance, 


it enables an 


asymptotic treatment of nuissance parameters in hypothesis testing (see 
Hajek, 1969; Jureékova, 1971b); another application consists in deriving the 
asymptotic distribution of rank, signed-rank, and other estimates and in trea- 
ting the asymptotic relations between them. We shall utilize the uniform 
asymptotic linearity of rank statistics several times in the subsequent text. 


See also Juretkova (1973b) for some more details. 


12 Nonlinear Regression 


178 Chapter 2. Robust statistical inference in linear models 


2.5 Estimators of regression coefficients based on rank tests 


Let us consider the regression model (2.2.1). We shall study the properties, 
mainly the asymptotic ones, of the estimator of 6 based on rank tests of Hy 
against regression alternatives. Such are, for instance, the estimators of the 
type (2.2.13) and (2.2.11). 

Let R;? denote the rank of the ith residual 


Si (y he XB) = > (x4; ar Z;) a,(R;*); i a »?~P> (1) 
s—1 
where 
" es 
B= — Dei 
NM i=1 


Then S,;(y) is a simple linear rank statistic defined in Section 2.4.1. If a,(7) 

= a,(t, f), 7 = 1,...,n, (see equation (2.4.1)) then S,,(y) provides the locally 

most powerful test of Ho (i.e., 6 = 0) against the alternatives that yn, --., Ynn 
n p 

are distributed according to [] f {yi — &) ¢:;6;), in a neighbourhood of 0 (see 


tw=1 j=1 
Theorem 2.4.1). The minimization (2.2.13) can be rewritten as 


\Sni(y — XB)| => min! (2) 


ibe 


~. 


The statistics S,;(y — Xf) are step-functions of 6 and their definition could 
be completed by continuity so that they are well defined for all 6 unless some 
components of y, are tied. The solution of (2) is not uniquely determined; 
denote by #, the set of solutions of (2). We shall show that n/2(B, — f) is 


asymptotically equivalent to n-1/2 if: WS,(y) for any 8, € #,; as n > ov, 
YW 

S,(Y) a (Sii(y), OR) Snpl¥)), 0 WAZ ey 

Jaeckel (1972) suggested any solution of minimization (2.2.11) as an esti- 
mator of 6 and proved that the solutions of (2.2.11) and (2.2.13) are asymptoti- 
cally equivalent in the sense that their difference tends to zero in. probability 
as m — oo. (2.2.11) involves the minimization of a convex function of f; the 
function and its derivatives could be calculated everywhere; in fact, the deri- 
vatives are —S,;(y — XB),7 = 1,..., p. 


2.5. Estimators of regression coefficients based on rank tests 179 


Koul (1971) considered the confidence region 
(8: S,(y — XB) W, Say — XB) S k,} (3) 


where k, is the critical value of y? distribution, and suggested the centre of 
gravity of (3) as an estimator of 6. Again, this estimator is asymptotically 
equivalent to the solution of (2). 

We shall find the asymptotic distribution of the estimator given by (1) (and 
at the same time that of any of the two other estimators). We shall show that 
the procedures yield asymptotically efficient estimators by an appropriate choice 
of the scores a,(z). Any of the three classes of estimators contains an asymp- 
totic minimax estimator, defined in Section 2.3. 

The explicit form of the estimator is known only in simple special cases. 
For instance, in the case of shift in location (p = 1; 2; = 0, 1 = 1,...,m; 
x,=—1,1=m-+1,...,n) the R-estimator B of B is the median of m(n — m) 
differences (YY; — Yi), ? = 1,...,.m;7 =1,...,n — m (Hodges and Lehmann, 
1963). 

In general, appropriate computational algorithms have not yet been elabo- 
rated. Any such algorithm should be iterative and every step requires a new 
ordering. From this point of view, the linearized versions of R-estimators are 
more convenient, for their calculation needs only one ordering. 

A linearized version of rank estimators was studied by Kraft and van Heden 
(1972b); it will be described in Section 2.5.2. It consists of a consistent initial 
estimator and an additive term based on ranks. The linearized rank estimator 
could also yield an asymptotically efficient estimator by an appropriate choice of 
the scores. Nevertheless, it is not asymptotically equivalent to the R-estimator. 

The optimal choice of an estimator within one of the classes mentioned above 
depends on the basic distribution #’. If this is unknown then one might use a 
part of the observations to estimate #’, and then adapt the estimator of B to 
this estimated F and in this manner obtain an asymptotically efficient estimator 
of B. This idea was first suggested by Stein (1956). A more detailed description 
of the adaptive estimator can be found in Section 2.5.3. Despite the excellent 
asymptotic properties of the estimators they are not convenient for practical 
purposes unless the number of observations is extremely large. 

We shall deal only with adaptive estimators based on ranks; but regarding 
the close relationships between different types of robust estimators we could 
imagine that the considerations could be modified in order to obtain adaptive 
M- and L-estimators. 


2.5.1 Asymptotic normality of R-estimators 
Consider the model (2.2.1) under the following system of assumptions: 


(A 8) The error distribution F satisfies the assumption (A 1) of section 2.3.4.2. 


(A 9) Let X, = ((ai))iz177% be a known (n X p) matrix satisfying 


12* 


180 Chapter 2. Robust statistical inference in linear models 


(a) wy = 2 + af*,¢=1,....257 =1,...,p. 
(b) The vectors x% = (aj, ..-. Uy). 7 = 1,..., p satisfy 
(2% — az)’ (25 — 3) SM, i =~ Sat, fail et DPE) 


NM ij=1 


where the scalar product in (4) is either zero for all but a finite number 
of n or positive for all but a finite number of 7; if it is positive, then 


. lim {mex (ai — as) |= (ah; — 2} | | eal We (5) 


n—>oo (1Si<n 
M > 0 is a constant. Analogous peng us a rae condition) 
are to be satisfied for the vectors x**, 7 = 1,..., p. 

(c) The inequalities (2.3.35) hold for all pairs 7,h = 1,...,p and 4,k 
aod Vary 


(Qehm dy p= Ss ((aj) ay exists and is positive definite, where 


n—>oo = 
a ((whp) \ir¥ a BS and 
(n) 1s = - 6 
Wir = ae 2 (i TR %;) (Liz cr Xx)s )> b= 1, see P- (6) 
i= 


Remark 2.5.1 Remark 2.3.3. applies to the assumption (A 9) as well. 


. (A 10) Let S,;(y — XB), f = 1,..., p, be the statistics of (1) with the scores 
a,(t), 1 = 1,...,n, generated by a function y(t), 0 <¢ < 1, satisfying 
the assumption (A 4) of Section 2.4.2, either by (2.4.28) or by (2.4.29). 


1 
y := | v(t) oft, f) de (7A) 
0 


with g(¢, f) defined in (2.4.12) and 


1 


oe = { (g(t) — gat, G = f olt) at. (7B) 


0 
Let #, be the set of solutions of the minimization (2). We shall accept any 
point of #, as an estimator of /. 


Definition 2.5.1 We say that n2(8™ — 8) is asymptotically normal N p(@, A) 
pointwise over the set B,, if there exists a sequence of random vectors {Tn} new such 
that n!2(T,, — B) is ee normal N (a, A) and 


sup ||n/?2(8, — T, I) —2> Oe asad Oo. (8) 
Bn& Bn 


The main theorem of the section follows. 


2.5. Estimators of regression coefficients based on rank tests eke a | 


\ 
Theorem 2.5.1 Under the asswmptions (A 8)—(A 10), n12(B™ — B) is asymp- 
totically normal 


2 
Np (0, 76 z) pointwise over #. (9) 


Proof. The proof follows ideas similar to those of the proof of Theorem 2.3.2. 
Our main tool is the uniform asymptotic linearity of rank statistics. We shall 
first extend this property to the multiparameter case. 


Lemma 2.5.1 Under the assumptions (A 8), (A 9), (A 10), 


lim max Ppt max 72/2 |Snily — XB) — Srj(y — XP) 


noo 15jsp n'/2||gn— p°ll< K 
+ nyo.(B — Bo)| = ‘| oy (10) 


for any K > 0, e > 0 and any fixed f° € IR”; o.; the jth column of 2. 


Proof of Lemma 2.5.1 We may suppose without loss of generality that ¢ is 
nondecreasing and that bes = 0. For each h, 1S hsp, let A, = {4 € R? | 
Ay = Ofork +h, k =1,..., p}. Then it follows from Theorem 2.4.4 that 


lim max Po{n-V?|S,(y — XB) — S,;(y) + nyByo,;| = e} = 0 


n—>co 15jsp ( 


for any « >0 and for any sequence { em Laem Such that n/2g™ — A € A,, 
Neem ae SP Ee 


p 

Let &; (B*, B**) be the rank of y,; — Xe xe + xi*BF), 4% Atealts 
where {B*} = {(B*}, 7 and {f**} = pawn) | ew are two sequences of vectors 
from R? such that n4/2g* = AM, n/2p** — A®,n = 1, 2,..., ||AM|, ||| S K. 


Introduce the statistics 


S(B*, BY) = ¥ (oh — Ft) a,(R(6*, 6**)) 
dence (12) 
St,(B**, B**) -— Y (att — zt) a,(Ri(B*, B**)) 
and 
S,(B*, B**) = S8t(B*, B**) + SHB BM), GF —1-p- (13) 


The sequences of densities {9%}, {q7,*}, where 
n Pp 


and 


- - we TR I 
dn (y) = at a — 65°) Bj ) 


182 Chapter 2. Robust statistical inference in linear models 


are continous with respect to p,(y) = in f(y;), » =<1, 2,... (see [A 2.5)). 


This in connection with (10) implies trae 


lim P, ‘ae S,,;(B*, B**) — Sr;(y) 


n—>co 


+ S60 ie — Ue)! (w%, — BE) + Ber (ait — Bry’ (whet — 24") 


ze =o, 


(14) 


On the other hand, it follows from (2.3.5) and from Jurecékova (1969, theorem 


2.1) that S%; is nonincreasing in ff, ..., 65 and nondecreasing in fy*, ..., Bp", 
while S7F is sions stiss in ff, ..., 8, and nonincreasing in fy", ..., 65*. 


The rect of the proof is quite alee to the corresponding part of the 
proof of Theorem 2.3.3 (see (2.3.51) —(2.3.53)). Hi 


Consider the sequence {v,} of random vectors 

VO, = ne = + 18,(y — XB°) (15) 
where S,(y — XB°) = (Su(y — DGB) eS se X6°)) and f° is the true 
(unknown) parameter value. Then Titer 2.4.3 implies that v, is asymptoti- 
cally normal N, (0, ee =) . The proof of Theorem 2.5.1 will be complete if 
we prove the aia lemma. 
Lemma 2.5.2 Under the assumptions (A 8)—(A 10), 

sup |[n1/( B, — Bo) — Vall > 0 as n—>00 for any «> 0. 

BnfBn (16) 
Proof. We may put 6° = 0. Denote 

TQ = (6 € RP | ni! |p| < Kp. 
Lemma 2.5.1 and the continuity of the operator 2-1! imply that 


lim Py) sup |n¥?98, — V,l| =e, Fn IY +0 
BnEBrnI , 


n—>-co 


n—oo 


= lint Ps | sup ne 
mB all SK 


‘ oa nl — XBn) — Sry) + 2B n z= 0 
(17) 


for any e > 0, K > 0. Moreover, there exist K* = K*(e) > 0 and m = nj(e) 
such that 


Le Semen ess hy Se ae) 


2.5. Estimators of regression coefficients based on rank tests 183 


To prove (18), take M > 0 such that (—M) < ra where @ is the stand- 
p 
ard normal distribution function and K* and 6 satisfy 


2\1/2 
se ecclesia 


Myp'?x (19) 
Ao 


ro] 


where Ay is the minimal eigenvalue of 2. Then there exists a positive integer 
m, such that 


Pot min (—f’-S,(y — XB)) < onl La ee (20) 
n¥/2\||< K* 2 


hold for all n > n, with 6* = 6K*. 
Actually, the left-hand side of (20) is bounded from above by the sum 


Py| min (—6’- S,(y — XB)) <6, 
n*!?||B|| = K* 


fain (—B') [Saty) — ny38] = 20+] 


ni!?||p|| = K* 


+Pe min (=P) [S,(9) — nyBp) < 2644, (21) 


nl?||B||= K* 


The first term of (21) is less than or equal to 


Po max (—F')[Saly) — myEB — Suly — XB] & O* 
n}/?||B\| = K* 


P O* 

<= Po} max nl?) |S,i(y) — nyo iB — Saj(y — XB)| = a ai0 
maigj=—K* — j=1 K 

(22) 


n —> co on account of Lemma 2.5.1. 


The second term of (21) is less than or equal to 


p 
Po{n-¥? |8,(y)|| > K*Aoy — 20} S x nH? |S ,(y)| = oN} 


(23) 


IIA 


€ 
— for n>% 
4 

in view of (19) and of Theorem 2.4.1. (20) then follows from (22) and (23). 


Lf 8, € IR? — 1%) then m1? ||6,|| = K* for By = eo nl2 and 


; 1 (_ pay gy —X 
2 lSuily — XB) 2 aq (PY uly — 8) 


184 Chapter 2. Robust statistical inference in linear models 
so that, according to (20), 


Pe{ min ih ¥ suly — X <3} 


pelR?—I\) pat 
< Pst min (—£'S,(y — XB)) < ox aie (24) 
n/2||B|| = K* 2 
for n > 7. 


On the other hand, Lemma 2.5.1 implies 


P 
P> jin nV? Y S,(y — XB)| = | — for n>, (25) 
j=l 


Ae 
pel? 2 
and (18) follows from (23) and (24). 
Finally, applying (17) and (18), we get 


P, | sup |in'"B, — v,l| = ‘| 


BrEBn 


Bn€ Ban 2 


= Po sup ||n¥/28, — v,|| 22,82, 9 IM + at 


ci Po sup ||n2?6,, sce Vall = é, sup |jn4?2B, re all <6, 
Bnf Bn BrEBn—I 


By 9 TE OF + Pal By — IR 0) 0, as n—->oo. Hl 


Corollary 2.5.1 If y(t) = ot, f),0 < t < land if the assumptions (A 8)—(A 10) 
are satisfied, then n4!2(B,, — B°) is asymptotically normally distributed 


1 
N,, (0, —— 271 
I(F) 
pointwise over B,; hence, &, provides an asymptotically efficient estimator. 


Proof. The asymptotic covariance matrix of (9) reduces to 


1 1 —2 1 
o2/y2) 5-1 = | p(t, f) dt ( [ e(t,f)dt) 2-2? =—— 5-1. 
(8) 24 = fort fae ([ 9460 at) mee 
Table 2.5.1 gives the asymptotic efficiencies e(#) of the R-estimator based 

on the Wilcoxon test (i.e. y(t) = 24 — 1, 0 <t < 1) with respect to the least 
squares estimator for the normal, double-exponential, and logistic error distri- 
butions (the efficiencies are maesured by the ratios of determinants of asympto- 
tic covariance matrices). 
e(F’) never falls below 0.864 and can be infinite. Hodges and Lehmann (1963) 
showed that both bounds can be attained by specific distributions. 


2.5. Estimators of regression coefficients based on rank tests 185 


Table 2.5.1 
F e(F) 
normal 0.955 
double-exponential 1.500 
logistic 1.097 


2.5.2 Linearized rank estimators and their asymptotic distribution 


' Let us keep the assumptions (A 8), (A 9), and (A 10) of Section 2.5.1. Let p, 
be a sequence of initial estimators satisfying the assumption of equivariance: 


(A 11) B,(a(y — XB)) = a(B,(y) — B) (26) 


for all 8 € R? anda > 0, 


and there exists a function ¢,(t),0 < ¢ < 1, satisfying assumption (A 4) 
of Section 2.4.2 and such that 


ml E B ( fa op a)" 280%] (27) 
) 


tends in probability to zero under the hypothesis 8 = 0, where 
Se) (SW). sas only) 


and 


= s 


Let g be a function satisfying the assumption (A 4) of Section 2.4.2 and 
let S,,(y — XB), k = 1,..., p, be the statistics given in (1). The linearized rank 
estimator of 6 is then defined as 


6=6.+ + W,Syy — Xf) (28) 


where W,, is given by (6) and a? by (8). 


Remark 2.5.2 The least squares estimator satisfies (A 11) with g(t) = F~*(4), 
0 <t < 1 (see Section 2.6). 


Denote 


a= (nl) Ard, A= { mlt)det 
0 0 
1 
n= f elt) ot f) dt (29) 
0 


O90) ae f(a@ am f1) (y(t) a 7) dt. 


0 


186 Chapter 2. Robust statistical inference in linear models 


The following theorem states that the linearized rank estimators (27) have an 
asymptotic normal distribution. 


Theorem 2.5.2 Suppose that the assumptions (A 8)—(A 11) are satisfied. Then, 
as 1 —> oo, nil2(3 — B) have the asymptotic normal distribution N (0, xX~1) 


-9(-2)24(- 
Corollary 2.5.2 Jf g(t) = l(t, f),0 <t < 1, than nil2(B — B) has the asymptotic 
normal distribution N, | 0, Tuy =) for any initial estimator Be B is then an 
asymptotically efficient estimator of B. 

Proof of Theorem 2.5.2. We can suppose that 8 = 0. The following lemma is 


an easy consequence of Lemma 2.5.1. 


Lemma 2.5.3 Suppose that the sequence { Bi} of random vectors is asymptotically 
bounded in probability. Then, under the assumptions (A 9) and (A 10), 


n-2 \IS,(y — XB) — Saly) + yrZB,|| 2+ 0, as n—->co. (30) 


Definition 2.5.2 The random vectors u, and v, are called asymptotically P- 
equivalent (denoted U,~v,) if P{\\u, — v,|| > 2} +0, as n> oo, for any 
> 0. 


Lemma 2.5.3 and (26) imply that 


mis mw acanss| 7 (1 — 2) sire + sa). (31) 
V1 oo a? 
[A 2.14] implies that 
AAS) nAPT MY), nAPS (y) ~ 2PL(y) (32) 

where 
PG) Ee bap) 
Try) = (Tai(y); 9% Pag): 

and 


(33) 
Ty) = X (wy —%) (Fy)), F=1,...p. 


The asymptotic distribution of n/2( — 6) then follows from (31), (32) and 
(33) and from Bunke and Bunke (1986, theorem 2.4.3). i 


2.5. Estimators of regression coefficients based on rank tests 187 


2.5.3 Adaptive rank estimators 


The optimal rank or linearized rank estimates are based on the score function 
p(t) = y(t, f) and thus they may be determined only if we know the corre- 
sponding error density /. More precisely, it suffices to know the type of f only, 
for the optimum rank tests are invariant with respect to changes of location 
and scale of f. 

However, we usually also do not know the type of f and just this lack of 
knowledge, in fact, stimulates the use of rank tests. Then we either prefer the 
simplicity to efficiency, and make use of some common rank tests (e.g. the 
Wilcoxon test), or may try to estimate f from all or part of the observations. 

Since the rank statistics depend on f through g(E, f), we shall try to estimate 
the latter function. The estimator of 8 will be either the rank or linearized rank 
estimator based on the estimated q(t, f). 

We shall briefly describe three adaptive procedures. They differ in the way 
of estimating g(t, f). 

The first procedure was suggested by Hajek (1970) and consists in selecting 
one of & distinct density types ¥,,..:, 7; generated by densities /,, ..., fz, 
i Ween Vek on Gn: 


Fs = {f: f(x) = Afj(Ax — 2), u € Rt, A > 0}, path ey ese | E(S4) 
Let Yni;---»Ynn be independent observations such that y,; — Px,; has a 
k 
density { which belongs to UF; but otherwise is unknown (7 = 1,..., 7). 


jg=1 
We shall try to find j such that f ¢ #;. The decision procedure 6(2,, ..., v,) 
takes on values in {1,...,k}. Let L(j,d) be the loss corresponding to f € F; 
and d(y) = d; 


if j=d 
i ee (35) 
[ner d, 


Restricting ourselves to the procedures invariant under the group of positive 
linear transformations, z(x) = ax + b,a > 0, we get the risk of the procedure 
given f € F;, 


R(j, 6) = 1 — P{Y;, ---5 Yn) =F | FE Fj}. (36) 


Let us fix 8 > 0 and compute the ratios 


ty = (3 ms — BF)” TU" ESPs + Ben) — SPMD 


Syn + Ban) = Xi Eni — %y) a(R), Mele (38) 


188 Chapter 2. Robust statistical inference in linear models 


Ré is the rank of Ypj + Bon; among Yn + BXq1,---> Yan + B%nn» t= 1,...,0 
and 


a) =o(— i] i= dete ats j= 1, +k. (39) 


The 1,; are invariant under positive linear transformations and it follows from 
Theorem 2.4.4 that 


iim n Pa Ha ae re z)| (hs)? Boia] = ‘| ai) (40) 
Thee 


where dP, = f;, dy and 
1 
Ojn a | m fr) v(é, fi) a| [I(F;) I(F,)}72, 9, h —— i Bey) ke. 
0 


It follows from (34) and from the Cauchy-Schwarz inequality that 
Op ied fag lor aj cake 9, il eee 


Consequently, the decision procedure 6* 


ISAS 


[o*(y1, sees Yn) = | S [?nilts»-- seey Yn) = a Se LanGa> «> tee ¥o)| (41) 
is consistent in the sense that 
k 
> Rh, 6*) +0, as n—->oo, for any fixed f. 


h=1 


To prove this, we may We in view of (40) for n = m(e) (denoting 


a= | Seu —# — &p | aa 


k k 
R(h, 5*) = ¥ PS*(y) =F |F € Fr) SD P(lnjly) = taal) | f © Fa) 
er jek 


SY Plan( Ita)? Bain + © = an(I(fa))"? Bonn — 
jh 
[l,j = an(Z(fx))¥? Boin| <é, [Un < a,(I(f,))¥? Bo;;| = eb + €; 


and as it follows from (41), the last probabilities are equal to zero for sufficiently 
small «. 

Let us resume how 6* may be applied in estimating. Consider the problem 
of estimating fy on the basis of observations y,,..., ¥, where y; — fx; has 
a density f. Then a proper estimator is a solution of minimization (2) or the 


2.5. Estimators of regression coefficients based on rank tests 189 


linearized rank estimator (27) with the scores a,(2) corresponding to the den- 
sity f in the well-known way. If the type of f is not known, we perform the 
decision procedure 6* based on a part of the observations, say y;,...; ios 
where m, — oo, m, = O(n). We take a certain number of density types and 
compute for them the quantities /,,; applied to y,,..., Ym, and to any fixed 
6 > 0. The density type providing the largest J,, ; is then chosen to generate 
the scores; the estimator is computed from Ym 41, .--, Yn- It need not necessarily 
be an R-estimator but could be an M-estimator or L-estimator as well. 

The procedure then selects one of a finite set F of distribution shapes. It 
has undisputed advantages: the shapes may be properly chosen in order to 
lead to well-known rank tests, etc. The procedure is consistent for any distri- 
bution from F. 

On the other hand, nothing is known if the true distribution is not con- 
tained in #. The following two procedures provide asymptotically efficient esti- 
mators for any regular distribution. The first one is due to Hdjek (1962) who 
utilized it to construct a test which is asymptotically optimal for all f with 
I(f) < co. Van Eeden (1970) then suggested an estimator of the shift in loca- 
tion derived from Hajek’s test in the manner of Hodges and Lehmann. Beran 
(1974) suggested a Fourier series estimator of y(t, {) and the computation of the 
linearized rank estimator based on it: 

Suppose that y,; is distributed according to density fo(y — Bx;),7 = 1,...,” 
such that I(fo) < oo and that g(t, fo) is nondecreasing in ¢ € (0, 1). 

Let {K,} be a sequence of integers satisfying 


K,> ©, K,/n +0, (42) 
let {p,} be a sequence of integers satisfying 
KP - po = KP 41 (43) 


and let {0 = hao < haa <-+: < hag, = K,} be a sequence of (g, + 1)-tuples 
of integers, satisfying 


lim max |ha,ji1 — hn,;|/K2% (44) 
no OSj<qn 
=lm min ln i+ a hn || Kn = 1. (45) 


moo 0SjS4n 


Let y < --- < y'X») be the order statistics of ym,.--,Ynx,- Then Hajek’s 
estimator @,(t) of y(t, fo) based on Yn1, ---» Ynx, 18 given by 


ed 
ibe ao) 


role 


1 
Ko 1 a (46) 
te yshinat Pn) — ylins— Pad) ying t Pn) oo ylbasi— Pa) 
n 


190 Chapter 2. Robust statistical inference in linear models 


for 
hnj ee a a Na ja 
Ke n—K,+1 °° &#&424&, 


> 


Piao ens Ore as areca Eatery, weaeaay «GAP 
the definition is completed by taking ¢,(¢) constant on the intervals 


( i—1 i | i= 1,2,...,n — K,; and G,(t) = 0 otherwise. 


ee on 
Then @,(é) is a consistent estimator of g(t, fo) in the sense that 


J 
f (alt) — oft, fo)P dt + 0, as n+ 00 (47) 
0 


where Py corresponds to 8 = 0. (For the proof of (47), we refer to Hdjek and 
Siddk, 1967; see also [A 2.15]). It follows from [A 2.2] and [A 2.5] that (47) 


holds also under any sequence of alternatives contiguous with respect to 
n 


II fo(yi), and hence it holds also under p + 0 and 2,; satisfying >) x,; = 0, 
t=1 n “<7 i—| 
(7 = 1, 2, 2..;) and lim | mex Loa (324) = 0. 

j= 


noo [1SiSn } 

Van Eeden (1970) shows how to modify ¢,(¢) in order to make it nondecrea- 
sing, constant on equidistant intervals, and still consistent. The resulting 
estimator of 6, asymptotically efficient, is then either the R-estimator or the 
linearized rank estimator corresponding to the statistic 


e- . is R=? 

— 5. Ce x Dn at 48 

| % i Bas K,, se ) 

where &;? is the rank of (y,; — Bx,;) among (yx 41 — Box 415 +++) Yn — B2q): 

For a detailed explanation we refer to van Heden (1970) (location model) or 
to Dionne (1981) (linear regression model). 

Another possibility is a Fourier series estimator of (t, fy). If I(fo) < co 

then g(t, fo) has the Fourier expansion 


V(t, fo) = dX d& exp (2a ikt) (49) 
\K]=1 
where 
1 


d, = | lt, f) exp (—2m ikt) de. (50) 
0 y 
If din i8 a proper estimator of d; based on yyy, ---; Yan, then a plausible estimate 
for @(é, fo) is 
a(t) = Xi den exp (2a ikt) (51) 
\k|=1 


where m, —> co at a suitable rate as n — oo. 


2.6. Asymptotic comparison of different estimation procedures 191 


One possibility of estimating d, is as follows: Vx(9) = exp (—2z ikt), |k| 
= 1, 2,..., let 6,, 6, be real numbers such that 0 = 107 = 05, Ie Then 


(k) k 
== S;, a Si 


kn és as 5, (52) 
is an estimator of d;, consistent in probability for all 4, where 
n Ré 
Soe (Qo a 4 5 
6 x n ) Pk hee 1 ( 3) 


and R? is the rank of yn; + 5%q; iM (Yn, + Opi, --+> Yan + O%nn)) i= 1,..., 0. 
An analogous estimator based on identically distributed observations was 
studied by Beran (1974). If {m,} is a sequence such that 


M,—>co and m7/n?>0, as n—> oo (54) 


then @,(#) of (51) is an estimator of g(t, fo) consistent in the sense of (47). This 
fact is proved in Beran (1974). The resulting asymptotically efficient estimator 
of f is then the linearized rank estimator based on the statistics 


Seen (nan Sr ee 55 
a XLni — Ly n ae 
8 )g (- ra ) (55) 
where R&;-* is the rank of y,; — xi in (Yn, — 2B, ---> Yar — Ln). 

A similar approach based on the Fourier series with respect to the Legendre 
polynomials was studied by Huskovd (1983). 


2.6 Asymptotic comparison of different estimation procedures 


We have mentioned several times that there are close relationships between 
different robust estimation procedures. For instance, if the underlying distri- 
bution is known and smooth, all procedures provide asymptotically efficient 
estimators. All respective estimators are asymptotically equivalent if the 
relationships of the corresponding ¢, y, and J functions are such as described 
in Remark 2.3.4. 

The present section is devoted to the mathematical background of some of 
these relationships. 


2.6.1 Asymptotic distribution of the difference of M- and R-estimators 


Suppose that the error distribution F and the design matrix X, of the model 
(2.2.1) and the functions g(t), 0 << ¢ < 1 and (a), a € IR? satisfy the assump- 
tions (A 8), (A 9), (A 10), and (A 3). 

Let w, y and «? be given by (2.3.36), (2.5.7a) and (2.5.7B), respectively. 


192 Chapter 2. Robust statistical inference in linear models 


Moreover, 
1 


p ae (a) dF(a), = { olt) dt. (1) 


0 


Let Bg be the R-estimator corresponding to the function 9g, i.e. bp is any 
solution of the minimization (2.5.2). Let fy oenote the M-estimator corre- 
sponding to the function y, i.e. By is a solution of the system of equations 
(2.2.9). The asymptotic relation between fz and fy is expressed in the follow- 
ing theorem. 


Theorem 2.6.1 Under the assumptions (A2), (A3), (A8), (A10) and for 
y +0, wo +0, the asymptotic distribution of the sequence {n*!2(By — Br)}new 
is p-dimensional normal with expectation 0 and with the covariance matrix 


1 


1 
1 2 
} 5 (w(-¥00) — 8) —— (ot) — a) de 5-1, (2) 
Proof. It follows from (2.3.53) and from Lemma 2.5.2 that 
nil2(Bye — Bp) ~ mtREA [= mong) — + sty — xp) (3) 
@ ue 


where M()(p°) = (I\(), ..., M(6°)) is given by (2.3.39) and f° is the 
true parameter value. Moreover (2.4.34) implies 


nUAS (y — Xp) wn, = nT, ..., TP) (t 
Tie = Lew oF (6:(6°)), 7 eat eee p- (5) 


nl( Bye — By) wn RE AX’ fe M80) — — of (Ue) (6) 
v(5(B)) = (v(01(8)), ---> v(3n(B))) 
9 F(8(B))) = (o(F (806), ---» o(F(8n(8))))- 


The rest of the proof follows easily from Bunke and Bunke (1986, theorem 
2.4.3). il 


Theorem 2.6.1 has several corollaries which have an interest of their own. 
Corollary 2.6.1 n4?26y ~ np, if and only if 
g(t) = ay(F-Ut)) +6 ae. in (0, 1) (7) 
fora >0,b€ R}. 


2.6. Asymptotic comparison of different estimation procedures 193 


Put 
WO) Ses oy ee RI (8) 


where g is a density such that y satisfies (A3) (for instance, g may be any uni- 
modal density with I(g) < oo). Then By is the maximum likelihood estimator 
corresponding to g. Similarly, put 


p(t) = v(t, 9), Orat i, (9) 


so that is is the R-estimator, asymptotically efficient in the case f = g. The 
asymptotic distribution of n/?(8y4 — Bg) is then normal with the expectation 0 
and with the covariance matrix 


{ : — ) aE 2 (10) 
1 ATO tea 
Donlie e\ ee) ae 


Under (8) and (9), we have the following corollaries: 


Corollary 2.6.2 Let y and satisfy (8) and (9), respectively, with g being the 
density of the normal distribution N(0, o2), 62 > 0. Then n'?By ~ Bp if 
and only if f is normal N(u, 47) with some w € IR1, A > 0. 


Corollary 2.6.3 Let wy and op satisfy (8) and (9), respectively, with g being the 
logistic density. Then, under the assumption of symmetry of f, By ~ nbn 
if and only if f = 9. 

Remark 2.6.1 Let y and 9 satisfy (8) and (9), respectively, with g being the 
density of double exponential distribution. Then n/?28y ~ n/28,_ for any 
symmetric error distribution f. 


Corollary 2.6.4 Let B be the least squares estimator, 1.e. Bi = Bu with y(x) = x, 
aw € IR}; let Bp correspond to a function g(t), t € (0,1). Then n¥28, ~ Bp if 
and only if 


p(t) = aF-1(t) + b fora > 0,6 € R!. (11) 


2.6.2 Asymptotic distribution of the difference 
of linearized rank estimator and R-estimator 


Let Bo be the linearized rank estimator of 8, defined in (2.5. 28), with the least 
squares estimator in the role of the initial estimator Be Let Br be the rank 
estimator (both 8, and Br are supposed to correspond to the score-generating 
function ¢). 


13 Nonlinear Regression 


194 Chapter 2. Robust statistical inference in linear models 


Theorem 2.6.2 Suppose that the assumptions (A2), (A3), (A8), (A10) are 
satisfied. The asymptotic distribution of n\/?( (Bo — Br) is then p-dimensional 
normal with the expectation 0 and the covariance matrix 


(1 2 a aaa ies fi g(t) — ¢) F-1(t) at] 2 (12) 
y 


where o? = [x? dF(x) — (fx dF(x)). 
Proof. If follows from Lemma 2.5.1 that 
n-VPZ-US(y — XBx) — Sy — XB,)] ~ —yn"(Br — Br); (13) 
and (2.5.2) and (2.5.28) imply 
n2(Bp — Ba) 
A A 1 A A 
~ n2(BR — By) + aE n?X-US8,(y — XBr) — Say —XPi)J- (14) 


Combining (13) and (14), we get 
n2(Bp to B2) as ( = =, n2(Bp oi B,). 


The rest of the proof then follows from Theorem 2.6.1. I 


Corollary 2.6.5 n¥23, ~n¥2B, if and only if either g(t) —% = aglt, f), 
O0<t<li,a>0or of) =aF t)+0,0<t<1lj;a>0,b€ R: 


2.0 Confidence intervals for regression coefficients 
based on ranks 


Let y,,.--, Yn be independent observations such that y; has the distribution 
function F(y — Bx;), 1 = 1,...,n, where F is assumed to be continuous but 
otherwise unknown. A family #@(y) of confidence sets for 6 at the confidence 
level (1 — x) may be based on the rank tests of hypotheses G(6°): B = f° in 
the following way. If for each 6° € IR, 4(f°) is the acceptance region of an «-test 
for testing H(6°), then 


Bly) = {8B «Rt: y € AlA)}. (1) 


For small and moderate values of n, the acceptance regions 4(6°) can be found 
from tables of the null distribution of rank statistics. More specifically, let 
S,(y — Bx) be the simple linear rank statistic 


Saly — Bx) = ¥ 2a,(BP) (2) 


t=1 


2.7. Confidence intervals for regression coefficients based on ranks 195 


where Rf is the rank of y; — fa; among (y, — Bx, ...;Y¥, — fx,) and the 
scores a,(7), 7 = 1,..., are generated by a function g, nondecreasing and 
Square-integrable on (0, 1), either by (2.4.22) or by (2.4.23). Suppose that the 
two-sided «-test of the hypothesis H(f°) : 6 = 6° accepts H(f°) when 


Cy S Sily — Bx) S CP. (3) 


If either a@,(z) + a,(m —7-+ 1) = const. or c; + ¢,-i4; = const., i = 1,...,n, 


then the distribution of S,(y — 6x) issymmetric and OC = 2% y a,(t) — CO, 
i=1 
Noting the fact that S,(y — fx) is a nonincreasing function of 6 with pro- 
bability 1 (cf. [A 3.8]), we get that the corresponding confidence region is an 
interval 


By) SB S By) (4) 
where 

B = sup {6: S,(y — Ba) > CP} (5) 
and 

B = inf (6: S,(y — Bx) < CY}. (6) 


The probability of (4) is independent of both f and F provided F is absolutely 
continuous. However, for given sample size n, the constants O® and C) 
for which (4) has exactly probability (1 — «) may not exist. To avoid the 
randomization, one can prefer « for which such values exist. For large sample 
sizes, it is enough that the constants C{?,,, C%,, are chosen in such a way 
that the probability (1 — «(n)) of (4) tends to the specified value (1 — «) as 
n tends to infinity. 

The asymptotic relative efficiency of two confidence procedures is usually 
measured by the limit of the ratios of the sample sizes necessary for attaining 
the same probabilities of covering the false parameter value. In such case, the — 
confidence intervals based on the asymptotic null distribution of the rank 
statistics have the asymptotic efficiencies relative to the standard confidence 
intervals equal to the Pitman asymptotic relative efficiencies of the correspond- 
ing rank and standard tests. 

Alternatively, the efficiency might be measured in terms of lengths of the 
intervals. It will be shown in Section 2.7.1 that the ratio of the squares of 
lengths (Z;,)?/L? of the standard and the rank confidence intervals, respec- 
tively, tends in probability to the relative asymptotic efficiency of both pro- 
cedures. Moreover, a multiple of L?2 is shown to be a consistent estimator of 
the asymptotic variance of the corresponding R-estimator of f. 

The length ZL, of the confidence interval is a random variable and cannot 
be bounded unless restrictions are placed on F’, since L, tends in probability 
to infinity as F becomes more and more spread out. Confidence intervals of 
length not exceeding a given number can however be obtained by taking 
observations X,, Xo, ... sequentially. 


13* 


196 Chapter 2. Robust statistical inference in linear models 


Stein (1956) first suggested a two-stage procedure for obtaining a bounded- 
length confidence interval in the case of normal population. Later Chow and 
Robbins (1965) proposed a sequential procedure for the mean of a population 
with finite variance; their procedure was extended by Gileser (1965) to the 
linear regression model. An analogous sequential procedure based on ranks 
was investigated by Geertsema (1970). Further work on this problem concern- 
ing the linear regression model is due to Ghosh and Sen (1972). We shall con- 
sider this problem in Section 2.7.2. 


2.7.1 Asymptotic efficiency of rank confidence intervals 


Let Yny, +--+» Ynn be independent observations such that y,; is distributed accord- 
ing to the distribution function F(y — P°x,;), 1 = 1,...,; suppose that 
E(B) <"oo, 

Denote 


z we es 1 
ip a Dy (2ni era; Saas et oa 2, Lni (7) 


and suppose that Noether’s condition is satisfied, i.e. 


lim Bes (ani — %n)/an| —(e (8) 


n—>oo |1Sisn 


Consider the statistics 


ll 
Pr 


where #&, is the rank of y,; — Bani In (Yur — B&nis +--+» Yan — PXnn), and the 
scores @,(7), ¢ = 1,...,”, are generated by a function y, nondecreasing and 
square-integrable on (0, 1), either by (2.4.22) or by (2.4.23). 

Introduce the confidence set 


By) = {8 € Rt: |S,(y — Bx)| Sa,K,} (10) 
where 
k, = 40-(1— 3), Ole (11) 
A? = [(pt)—gP dt, G= f ole) de (12) 
0 0 


and @~? is the inverse standard normal distribution function. 
The following lemma states that the limiting probability of covering the 
true value by #@,(y) is (1 — «). 


Lemma 2.7.1 Under (7)—(12) and for any « € (0, 1), 
lim Ppo{h° € B&,(y)}} =1—x«. (13) 


n— 


2.7. Confidence intervals for regression coefficients based on ranks 197 
Proof. Py{B? € Baly)} = Pol|Saly — px)| < anK} 

= P,{\S,(y)| S 4,K.} > 20-(K,/A) = 1 ~ « 
as 1 —> oo, where the convergence follows from Theorem 2.4.3. Hl 


In view of the monotonicity of S,(y — Bx) with respect to 6 we can write 


Bly) = (Bry), Bry) (14) 
where 
_ B, := Bry) = sup {6 | S,(y — Bx) > a,K,} (15) 
and 
B, := Bry) = inf {6 | S,(y — fr) < —a,K,}. (16) 
Denote 
L,=B, —B, (17) 


the length of confidence interval (13). The following theorem shows that a? L? 
is a consistent estimator of a multiple of the asymptotic variance of R-estimator 
of 8 based on S,. 


Theorem 2.7.1 Under (7)—(12) and (14) —(17), 
G,Ln > 207 (1 Ed = as (18) 
2} 7 
in probability under the hypothesis B = B°, as n > oo, with y defined in (2.5.7). 
Proof. a,(B;, — f°) and a,(B; — f°) are asymptotically normal, 
2 
N ( A @-1 ( ae = = (19) 
y a a a 


respectively, and thus they are asymptotically bounded in probability. Actu- 
ally, we have 


t 
lim Pp{a,(B, — B°) > t} = lim Pz {* (y _ S — ) ) oe uk] 


== hi ede Day ae > nk} =1 — of 2 (1+) 
n—>oo : an J a y 


(where we have used [A 2.13)). 
We may proceed analogously concerning B,. 
Lemma 2.5.3 then implies that 


lim Pyp{|S,(y — Bz) — S,(y — Bx) + a,(B, — 6°) y| = ean} = 0 


198 Chapter 2. Robust statistical inference in linear models 


and 
lim Ppo{|S,(y — Byx) — S,(y — B°x) + a3(B, — B°) | = ean} = 0 


hold for any « > 0. (15), (16), (20), and (21) then imply that 


lim Pyo{\a,(Bt — Bz) y —2K,| =e} =0. 


n—>oco 


Suppose now that > Xn; = 0. The problem is that of asymptotic efficiency 
i=1 
of the confidence interval (14) with respect to the standard confidence interval 


[21 Se | (22) 


1 4 ; 
where S? = Tene x (y; — xB)? and f, is the least squares estimator. 
x. i=1 


Following Lehmann (1963), we shall measure the efficiency in terms of the 
probability of covering false values, more precisely, in terms of the probability 
that the intervals cover the value 6° + 6n~1/2, Then it follows from the relation 
between the confidence intervals and the tests on which they are based and 
from the asymptotic properties of the rank tests (cf. [A 2.13]) that the intervals 
(14) which are based on n observations and the intervals (22) on based n’ 
observations will have the same asymptotic probabilities of covering the 
values f° + dn-1/2 as n —> oo, provided 


, 


ee oytA-4 as N —> oo (23) 
n 


where o? is the variance of F’. In this sense, the right-hand side of (23) is the 
relative asymptotic efficiency of the two sets of intervals. 

Alternatively, the efficiency might be measured in terms of the lengths of 
the intervals. Let L;, denote the length of the interval 1 in (22). Then it follows 
from Theorem 2.7.1 and from (22) that 


L,)? 
Sal ott (24) 


in probability as n - co under the hypothesis 6 = f°. If the intervals (14) 
are based on n and the intervals (22) on n’ observations, respectively,’ the 
ratio L;,/L,, will tend in probability to one, provided (23) holds. Thus the right- 
hand side of (23) is also a reasonable measure of efficiency when the comparison 
is made in terms of the length of intervals. 


2.7. Confidence intervals for regression coefficients based on ranks 199 


2.7.2 Bounded length confidence interval based on the Wilcoxon test 


Let y;, yz, ... be independent observations such that Yn is distributed accord- 
ing to the distribution F(y — Bx_),n = 1, 2,...; suppose that I(F') < oo. 
We want to determine a confidence interval J, = {8 | By < B < Bt} such 
that 

PAB €1,)=1—« (25) 
and 

0< L, = B, — B; S 2d (26) 


for some given d (> 0). 

If F is not known, no fixed-sample procedure is available which guarantees 
(26) for all F from a large class (say, that of all absolutely continuous distri- 
bution functions), since L, tends to infinity in probability as F becomes more 
and more spread out. Confidence intervals for 8 of the length not exceeding 2d 
can, however, be obtained by taking the observations y,, y2, ... sequentially 
as follows. Having observed y,, ..., Yn, calculate (BZ, By) of (15) and (16) for 
n= 1,2,..., and continue taking observations untill L, < 2d; the first 
integer for which this is the case we shall denote by N(d). Moreover, denote 


Lin) = (%y, +++) Un) (27) 
and 
n 1 n 
a, = > (a; — 3%), Li ae (28) 
t=1 nN j=1 
The following assumptions are imposed on 2(q): 
(A12) max |x};| = a,' max |x; — %,| = O(n-1?), 
1Sisn 1Sisn 
(A13) lim n—a2 = K, > 0. 


(A14) Put Q(a) := (n+ 1—a)a@4+(a—n)a2,, ifnsSasn+1,n=0, 
1,..., where we set a? = 0. We assume that Q(a) is nondecreasing in a 


and that 
m O(nb») == (0) whenever limb, = ), (29) 
n—>0oo Q(n) n—>0o 


s(b) being strictly increasing with s(1) = 1. 


The assumptions (A12)—(A14) represent conditions on the trend of the 
coefficients x,, X2, ..., and, by Ghosh and Sen (1972), they are satisfied in the 
majority of practical situations. For example, they are satisfied in the two- 


sample situation (t;,—=0, %,=1, 1=1,2,... with |a;,| = (2n)-2/2, 
a 

Get aoe (n = 1,2,...); Q(2) = —), forza, =a+th,h>0 

; 2, 

G@a= 125 7..), Ole. 


200 Chapter 2. Robust statistical inference in linear models 


Let RB; = Yuly; — yj; — («i — 2%) B) be the rank of y; — fx; among 
j=l | 
— BX, ---; Yn — Bn; B € IR. Consider the Wilcoxon rank statistic based on 


Yrs -++> Yn: 
S.ly — Bx) = D (ei —%) Rh, (30) 


and the confidence region 


Ty) = per IS,(y — Bx)| S : ano (1— 2). (31) 
y12 
Then J,,(y) = (B,, B;) with B,, BZ defined as in (16) and (17); it follows from 
Lemma 2.7.1 that 
lim P 5o{6° € In(y)} = 1 —«. 


n—>0o 


Define the stopping variable to be the first integer N(d) = m) for which 
Ly = Bria) — Bria) S 2d where ny is a positive integer. Consider the interval 


Iy@ = (8: Buia < 8 < Bya)}- (32) 
Having defined a sequential procedure in the above way, two questions 


immediately arise: 

(a) What is the behaviour of N(d)? 

(b) What is the coverage probability of the procedure? 

These questions can, under certain assumptions, be answered asymptotically 
as d -> 0; the problem is still open for fixed d > 0. 

Theorem 2.7.2 Under the assumptions made above, 

(i) N(d) ts a nonincreasing function of d(> 0); 

(ii) N(d) ts finite for all d > 0 with probability 1; 


(ii) lim N(d) = co with probability 1; 
d—0 


(iv) lim P,{8 € Ing} = 1 — &. 
d—0 ; 
Proof. (1) The monotonicity follows directly from the definition of N(d). 


(ii) For any fixed d > 0, 
PAN(@) = co) = Py 1 IN) > n}) 
n=1 
= lim P,(N(d) > n) < lim P,(L, > 2d) 


= lim P,(a,L, > 2a,d) = 0 by (19) and assumption (A13). 


n—>0o 


2.7. Confidence intervals for regression coefficients based on ranks 201 


(iii) lim N(d) = oo with probability 1 if and only if 


d—0 


K>0 d>0 d’<d 


ae PL, ENE iB fs (33) 


Monotonicity of M(d) implies that 


UN UW) <K=U A {v(>) <a] 


K>0 d>0 d’<d K>0 v=1 v 


Se 2 


K>0 »=1 nSK Usp K>0 nSK 


and the convergence a.s. follows from P(E, > 0) = 1 for any n. 


(iv) We shall only sketch the rather delicate proof of the last proposition of 
Theorem 2.7.2. The ideas are due to Anscombe (1952), Geertsema (1970), and 
Ghosh and Sen (1972). 

It follows from Theorem 2.7.1, from assumption (A13) and from definition 
of N(d) that d*N(d) = O,(1). We have 


PsiB € Iya (y)} = PollSuaY)| S @uayKa}- 


By Theorem 1 of Anscombe (1952), the last probability tends to 20-1(471K,) 
= 1 — a, provided the following lemma holds. 


Lemma 2.7.2 For any positive ¢ and n, there exists a 6 > 0 such that 


P| sup |Su(Yn) — Sa(Yn)| > | <e. (34) 
n’:|n—n'|<6n 

Proof. If follows from assumption (A14) that for any 6’ > 0 there exists a 
An 


6>O0Osuch that sup {1 — 0% 


|n’—n|<dn an 
Let R° = (R°,,..., R°,,) and let 8, = B(RR) be the o-algebra generated by 
R°, n= 1. Then {S,(y,), Bn, 2 = 1} is a martingale (see [A 3.2] for a definition). 


Indeed, 


_— Rey n+1 
E(Snas | %,,) = (%n41 = Xn+1) | ON ee aaa 


n + 2 
x w Roi 
ae = (x; — %_4,) E 3 ia ®,). (35) 


0 
Since # Rost ntt 
n+ 2 


202 Chapter 2. Robust statistical inference in linear models 


for any 1 <7 <7, we have from (35) that 


\ 


- iets 
E(Sns1 | Bn) = DL (x; — Zp) =S,, bree. 
i=1 n+1 


Now, we get from the Kolmogorov inequality for martingales (see [A 3.3]) 
that 


P| sup [Sw ape S,| = it 
|n’—n|<6n 

1 n + [dn] n — [dn] 
(dy dee eee 


for n > m and appropriate 6 > 0. 


Bounded length sequential confidence intervals, based on ranks were further 
studied by Huskovd (1982). Sequential confidence intervals based on M-esti- 
mators were studied by Jureékovd and Sen (1981a, b). 


2.8 References 


Adichie, J. N. (1967). ‘Estimate of regression parameters based on rank tests.’ Ann. 
Math. Statist., 38, 894—904. 

Andrews, D. F., Bickel, P. J.. Hampel, F. R., Huber, P. J., Rogers, W. H, and Tukey, 
J. W. (1972). Robust estimates of location. Survey and Advances. Princeton University 
Press, Princeton. 

Anscombe, F. J. (1952). ‘Large-sample theory of sequential estimation.’ Proc. Camb. 
Phil. Soc., 48, 600—607. 

Anscombe, F. J. (1967). ‘Topics in the investigation of linear relations fitted by the 
method of least squares.’ J. Royal Statist. Soc., Ser. B, 29, 1—52. 

Aniille, A. (1974). A linearized version of the Hodges-Lehmann estimator.’ Ann. Statist., 
2, 1308—1313. 

Antoch, J., Collomb, G., and Hassani, S. (1984). ‘Robustness in parametric and non- 
parametric regression estimation: An investigation by computers imulation.’ COMP- 
STAT 1984, pp. 49—54. Physica-Verlag, Vienna. 

Azencott, R., Birgé, L., Costa, V., Dacunha-Castelle, D., Deniau, C., Deshayes, J., Huber- 
Carol, C., Jolivaldt, P., Oppenheim, G., Picard, D., Trécourt, P., and Viano, C. (1977). 
‘Theorie de Ja robustesse et estimation d’un paramétre.’ Astérisque, 43—44. 

Bassett, G. W., and Koenker, R. W. (1978). ‘The asymptotic distribution of the least 
absolute error estimator.’ J. Amer. Statist. Assoc. 73, 618—622. 

Beran, R. (1974). ‘Asymptotically efficient adaptive rank estimates in location models.’ 
Ann. Statist., 2, 63—74. 

Beran, R. (19774). ‘Robust location estimates.’ Ann. Statist., 5, 431—444. 

Beran, R. (1977b). ‘Minimum Hellinger distance estimates for parametric models.’ 
Ann. Statist., 5, 445—463: 

Beran, R. (1978). ‘An efficient and robust adaptive estimator of location.’ Ann. Statist., 
6, 292—313. : 

Bickel, P. J. (1965). ‘On some robust estimates of location.’ Ann. Math. Statist., 36, 
847 —858. 


2.8. References 203 
ee a a a a et ee eNO 


Bickel, P. J. (1973). ‘On some analogues to linear combinations of order statistics in the 
linear model.’ Ann. Statist., 1, 597—616. 

Bickel, P. J. (1975). ‘One-step Huber estimates in the linear model.’ J. Amer. Statist. 
Assoc., 70, 428—434. 

Bickel, P. J. (1976). ‘Another look at robustness: A review of reviews and some new 
developments.’ Scand. J. of Statistics, 8, 145—168. 

Bickel, P. J. (1981). ‘Quelques aspects de la statistique robuste.’ Lecture Notes in Mathe- 
matics No. 876 pp. 1—72, Springer-Verlag, New York. 

Bickel, P. J. (1982). ‘On adaptive estimators. Ann. Statist., 10, 647—671. 

Bickel, P.J., and Lehmann, E. L. (1975). ‘Descriptive statistics for nonparametric model. 
I. Introduction, II. Location.’ Ann. Statist., 8, 1038—1044 and 1045 —1069. 

Birnbaum, A., and Laska, EL. (1967). ‘Optimal robustness: A general method with appli- 
cations to linear estimates of location.’ J. Amer. Statist. Assoc., 62, 1230—1240. 

Boos, D. D. (1979). ‘A differential for L-statistics.’ Ann. Statist., 7, 955—959. 

Box, G. HE. P. (1953). ‘Non-normality and tests on variance.’ Biometrika, 40, 318—335. 

Box, G. EH. P., and Anderson, S. L. (1955). ‘Permutation theory in the derivation of 
robust criteria and the study of departures from assumption.’ J. Royal Statist. Soc., 
Ser. B, 17, 1—34. 

Bunke, H.,and Bunke, O. (Eds.) (1986). Statistical Inference in Linear Models. John Wiley, 
Chichester. 

Bustos, O. H. (1982). “General M-estimates for contaminated p-th order autoregressive 
processes: Consistency and asymptotic normality.’ Z. Wahrsch. verw. Geb., 59, 491 
to 504. 

Cheng, K. S., and Hettmansperger, T. P. (1983). ‘Weighted least-squares rank estimates.’ 
Comm. Statist., A12, 1069—1086. 

Carroll, R. J. (1978). ‘On the asymptotic distribution of multivariate M-estimates.’ 
J. Multivar. Analysis, 8, 361—371. 

Carroli, R. J., and Ruppert, D. (1982/83). “Weak convergence of bounded influence re- 
gression estimates with applications to repeated significance testing.’ J. Statist. 
Planning Infer., 7, 117—129. 

Chernoff, H., Gastwirth, J., and Johns, M. V. (1967). ‘Asymptotic distribution of linear 
combinations of functions of order statistics with applications to estimation.’ Ann. 
Math. Statist., 38, 52—72. 

Collins, J. R. (1982). ‘Robust M-estimators of location vectors.’ J. Multivar. Analysis, 
12, 480—492. 

Chow, Y.S., and Robbins, H. (1965). ‘On the asymptotic theory of fixed-width sequential 
confidence intervals for the mean.’ Ann. Math. Statist., 36, 457 —462. 

Dionne, L. (1981). ‘Efficient nonparametric estimators of parameters in the general 
linear hypothesis.’ Ann. Statist., 9, 457—460. 

Donoho, D. L., and Huber, P. J. (1983). ‘The notion of breakdown point.’ A Festschrift 
for Erich L. Lehmann (P. J. Bickel, K. Doksum and J. L. Hodges, Eds.), pp. 157—184. 
Wadworth, California. 

Dutter, R. (1975a). ‘Robust regression: Different approaches to numerical solution and 
algorithms.’ Res. Report No. 6, ETH Ziirich. 

Dutter, R. (1975b). ‘Numerical solution of robust regression problems: Computational 
aspects, a comparison.’ Res. Report No. 7, ETH Zirich. 

Dutter, R. (1977). ‘Algorithms for the Huber estimator in multiple regression.’ Comput- 
ing, 18, 167—176. 

Dutter, R. (1978). ‘Robust regression: LINWDR and NLWDR.’ COMPSTAT 1978 
(L. C. A. Corsten, Ed.) Physica Verlag, Vienna. 

Field, C. A., and Hampel, F. R. (1982). ‘Small sample asymptotic distributions of M- 
estimators of location.’ Biometrika, 69, 221—226. 


204 Chapter 2. Robust ‘statistical inference in linear models 


‘ 


Forst, F. R., and Ali, M. M. (1981). ‘Monte Carlo studies of some edepine robust pro- 
cedures ie location.’ Canad. J. of Statist., 9, 229—235. 

Freedman, D. A., and Diaconis, P. (1982). ‘On inconsistent M-estimators.’ Ann. Statist., 
10, 454—461. 

Ghosh, M., and Parsian, A. (1981). ‘Admissible linear estimates of the regression para- 
meters.’ Calcutta Statist. Assoc. Bull., 80, 107—113. 

Ghosh, M., and Sen, P. K. (1972). ‘On bounded length confidence intervals for the re- 
gression coefficients based on a class of rank statistics.‘ Sankhya, A84, 33—52. 

Geertsema J. (1970). ‘Sequential confidence intervals based on rank tests.’ Ann. Math. 
Statist., 41, 1016—1026. 

Gleser, L. J. (1965). ‘On the asymptotic theory of fixed-size sequential confidence 
bounds for linear regression parameters.’ Ann. Math. Statist., 36, 463—467. 

Hajek, J. (1961). ‘Some extensions of the Wald-Wolfowitz-Noether Theorem.’ Ann. 
Math. Statist., 32, 506—523. 

Hajek, J. (1962). ‘Asymptotically most powerful rank-order tests.’ Ann. Math. Statist., 
38, 11241147. 

Hajek, J. (1969). A Course in Nonparametric Statistics. Holden-Day, San Francisco. 

Hajek, J. (1970). ‘Miscellaneous problems of rank test theory.’ Nonparametric Techni- 
ques in Statistical Inteference (Puri, M. L. Ed.) Cambridge Univ. Press, Cambridge. 

Hajek, J., and Sidak, Z. (1967). Theory of Rank Tests. Academia, Prague. 

Hampel, F. R. (1971). ‘A general qualitative definition of robustness.’ Ann. Math. 
Statist., 42, 1887—1896. 

Hampel, F. R. (1973). ‘Robust estimation: A condensed partial survey.’ Z. Wahrsch. 
verw. Geb., 27, 87—104. 

Hampel, F. R. (1974). ‘The influence curve and its role in robust estimation.’ J. Amer. 
Statist. Assoc., 62, 1179—1186. 

Hampel, F.R. (1978). ‘Optimally bounding the gross-error-sensitivity and the in- 
fluence of position in factor space.’ Proc. Amer. Statist. Assoc., 1978 Statist. Computing 
Section, pp. 59—64. 

Hampel, F. R., Rousseeuw, P.J., and Ronchetti, H. (1981). ‘The change-of-variance 
curve and optimal redescending M-estimators. J. Amer. Statist. Assoc., 76, 643—548. 

Heiler, S., and Willers, R. (1979). ‘Asymptotic normality of R-estimates in the linear 
model.’ Forschungsbericht 79/6, Univ. Dortmund. 

Hodges, J. L., and Lehmann, EH. L. (1963). ‘Estimates of location based on rank tests.’ 
Ann. Math. Statist., 34, 598—611. 

Hogg, R. V. (1967). ‘Some observations on robust estimation.’ J. Amer. Statist. Assoc., 
62, 1179—1186. 

Hogg, R. V. (1974). “Adaptive robust procedures. A partial review and some suggestions 
for future. Applications and theory. J. Amer. Statist. Assoc., 69, 909—923. 

Hogg, Rk. V. (1979). ‘Statistical Robustness: One View of its Use in Applications Today.’ 
The American Statistician, 38, 105—108. 

Holland, P. W., and Welsch, R. E. (1977). ‘Robust regression using iteratively reweighted 
least squares.’ Comm. Statist., A6, 813—827. 

Hollander, M., and Woife, D. A. (1973). Nonparametric Statistical Methods. John Wiley, 
New York. 

Huber, P. J. (1964). ‘Robust estimation of a location parameter.’ Ann. Math. Statist., 35, 
73—101. 

Huber, P. J. (1965). ‘A robust version of the probability ratio test.’ Ann. Math. Statist., 
36, 1753—1758. 

Huber, P. J. (1967). ‘The behavior of maximum likelihood estimates under nonstandard 
conditions.’ In: Proc. Fifth Berkeley Symp. Vol. 1, 221—233. 

Huber, P. J. (1968). ‘Robust confidence limits.’ Z. Wahrsch. verw. Geb., 10, 269—278. 


2.8. References 205 
eg ee ee Pee ek et ao 


Huber, P. J. (1969). ‘Théorie de Vinférence statistique robuste. Presses de l’Université 
Montréal. 

Huber, P. J. (1970). ‘Studentizing robust estimates.’ In: Nonparametric Techniques 
in Statistical Inference (M. L. Puri, Ed.). Cambridge Univ. Press, Cambridge. 

Huber, P. J. (1972). ‘Robust statistics: A review.’ Ann. Math. Statist., 48, 1041 —1067. 

Huber, P. J. (1973). ‘Robust regression: Asymptotics, conjectures and Monte Carlo.’ 
Ann. Statist. 1, 799—821. 

Huber, P. J. (1975). ‘Robustness and designs.’ In: A Survey of Statistical Design and 
Linear Models (J. N. Srivastava, Ed.). North-Holland, Amsterdam. 

Huber, P. J. (1977). ‘Robust Statistical Procedures.’ Regional Conf. Ser. in Applied 
Math. No 27. SIAM, Philadelphia. 

Huber, P. J. (1979). ‘Robust smoothing.’ Proc. ARO Workshop on Robustness in Statistics 
(Rk. L. Launer and G. N. Wilkinson, Eds.), Academic Press, New York. 

Huber, P. J. (1981). Robust Statistics. John Wiley, New York. 

Huber, P. J. (1983). ‘Minimax aspects of bounded-influence regression.’ J. Amer. 
Statist. Assoc., 78. 66—72 

Huber, P. J. (1982). ‘Recent trends in robustness.’ [HEE Intern. Symp. on Inform. 
Theory, Les Arcs, France. 

Huber, P. J. (1984). ‘Finite-sample breakdown of M- and P-estimators.’ Ann. Statist., 
12, 119—126. 

Huber, P. J., and Dutter, R. (1974). ‘Numerical solutions of robust regression problems.’ 
In: COMPSTAT 1974 (G. Bruckmann, Ed.), Physica Verlag, Vienna. 

Huber, P. J., and Strassen, V. (1973). ‘Minimax tests and the Neyman-Pearson lemma 
for capacities.’ Ann. Statist., 1, 251263; 2, 223224. 

Huskovd, M. (1982). ‘On bounded length sequential confidence interval for parameter 
in regression model based on ranks.’ Coll. Math. Soc. J. Bolyai, 32, 435—463. 

Huskovad, M. (1983—84). ‘Adaptive procedures for the two-sample location model. 
Commun. Statist. C, 2, 387—401. 

Huéskovd, M. and Jureékovd, J. (1981). ‘Second order asymptotic relations of M-esti- 
mators and L-estimators in two-sample location model.’ J. Statist. Planning Infer., 5, 
309 — 328. 

Jaeckel, L. A. (1971a). ‘Robust estimates of location: Symmetry and asymmetric con- 
tamination.’ Ann. Math.. Statist., 42, 1020—1034. 

Jaeckel, L. A. (1971b). ‘Some flexible estimates of location.’ Ann. Math. Statist., 42, 
1540—1552. 

Jaeckel, L. A. (1972). ‘Estimating regression coefficients by minimizing the dispersion 
of the residuals.‘ Ann. Math. Statist., 48, 1449—1458. 

Johns, M. V. (1979). ‘Robust Pitman-like estimators.’ In: Robustness in Statistics 
(R. L. Launer and G. N. Wilkinson, Eds.), pp. 49—60. Academic Press, New York. 
Jung, J. (1955). ‘On linear estimates defined by a continuous weight function.’ Ark. 

Math., 3, 199—209. 

Juretkovd, J. (1969). ‘Asymptotic linearity of a rank statistics in regression parameter.’ 
Ann. Math. Statist., 40, 1889—1900. 

Jureckovd, J. (1971a). ‘Nonparametric estimate of regression coefficients.’ Ann. Math. 
Statist., 42, 1328—1338. 

Jureckovd, J. (1971b). ‘Asymptotic independence of rank test statistic for testing sym- 
metry on regression.’ Sankhya, A 33, 1—18. 

Juretkovd, J. (1973a). ‘Central limit theorem for Wilcoxon rank statistics process.’ 
Ann. Statist., 1, 1046—1060. 

Juretkovd, J. (1973b). ‘Asymptotic behaviour of rank and signed-rank statistics from 
the point of view of applications.’ Proc. Prague Symp. on Asymptotic Statistics I, 
139—155. 


— 206 Chapter 2. Robust statistical inference in linear models 


Jureckovd, J. (1977). ‘Asymptotic relations of M-estimates and R-estimates in linear 
regression model.’ Ann. Statist., 5, 464—472. 

Juretkovd, J. (1978 a). ‘Asymptotic relations of least-squares estimate and of two robust 

_ estimates of regression parameter vector.’ Trans. 7th Prague Conf. and European 
Meeting of Statisticians II, pp. 231—237. 

Juretkovd, J. (1978b). ‘Bounded-length sequential confidence interval for regression 
and location parameters.’ Proc. 2nd Prague Conf. on Asymptotic Statistics Ii (P. 
Mandl and M. Huskovd, Eds.), pp. 231—237. North-Holland, Amsterdam. 

Jureckovd, J. (1980). ‘Asymptotic representation of M-estimators of location.’ Math. 
Operationsforsch. Statistik, Ser. Statistics, 11, 61—73. 

Juretkovd, J. (1981). ‘Tail-behavior of location estimators.’ Ann. Statist., 9, 578—585. 

Jureckovd, J. (19834). ‘Robust estimators of location and regression parameters and 
their second order asymptotic relations.’ Trans. 9th Prague Conf. on Inform. Theory, 
Random Processes and Statist. Decis. Functions, pp. 19—32. Reidel, Dordrecht. 

Jureckovd, J. (1983b). ‘Winsorized least-squares estimator and its M-estimator counter- 
part.’ In: Contributions to Statistics. Essays in Honour of Norman L. Johnson (P. K. 
Sen, Ed.), pp. 237—245. North Holland, Amsterdam. 

Jureckovd, J. (1983c). ‘Robust estimators and their relations.’ Acta Universitatis 
Carolinae-Mathematica et Physica, 24, No. 1. 

Juretkovd, J. (1983¢). ‘Trimmed polynomial regression.’ Comment. Math. Univ. Caroliae 
24, 597—607. 

Jureckovd, J. (1984a). ‘Regression quantiles and trimmed least-squares estimator under 
a general design.’ Kybernetika, 20, 345—357. 

Jureckovd, J. (1984b). ‘Sequential confidence intervals based on robust estimators.’ 
Sequential Methods in Statistics: Banach Center Publications, 16, 309—319. Polish 
Scientific Publishers, Warsaw. 

Juretkovd, J. (1984c). ‘M-, L- and R-estimators.’ Handbook of Statistics IV, Chapter 21, 

' pp. 463—485 (P. R. Krishnaiah and P. K. Sen, Eds.), North-Holland, Amsterdam. 

Jureckovd, J., and Sen, P. K. (19814). ‘Invariance principles for some stochastic processes 
related to M-estimators and their role in sequential statistical inference.’ Sankhyd, 
A43, 191—210. 

Juretkovd, J., and Sen, P. K. (1981b). ‘Sequential procedures based on M-estimators 
with discontinuous score functions.’ J. Statist. Planning Inference, 5, 253—266. 

Juretkovd, J., and Sen, P. K. (1982a). ‘M-estimators and L-estimators of location: Uni- 
form integrability and asymptotically risk-efficient sequential versions.’ Comm. 
Statist., C1, 27—56. 

Juretkovd, J. ,and Sen, P. K. (1982b). ‘Simultaneous M-estimator of the common loca- 
tion and the scale-ratio in the two-sample problem.’ Math. Operationsforsch. Statist., 
Ser. Statist., 18, 163—169. 

Kagan, A. M., Linnik, J. V., and Rao, C.R. (1965). ‘On a characterization of the normal 
law based on a property of the sample average.’ Sankhyd, A27, 405—406. 

Kagan, A. M., Linnik, J. V., and Rao, C. R. (1973). ‘Characteristic Problems of Mathe- 
matical Statistics.’ Nauka, Moscow (in Russian). 

Klein, R., and Yohar, V. J. (1981). ‘Asymptotic behavior of iterative M-estimators for 
the linear model.’ Comm. Statist., A10, 2373—2388. 

Koenker, R., and Basseti, G. (1978). “Regression quantiles.’ Econometrica, 40, 33—50. 

Koenker, R. W., and Bassett, G. W. (1982). ‘Robust tests for heteroscedasticity based 
on regression quantiles.’ Hconometrica, 50, 43—62. 

Koul, H. L. (1971). ‘Asymptotic behavior of a class of confidence regions based on ranks 
in regression.’ Ann. Math. Statist., 42, 42—57. 

Koul, H. L. (1977). ‘Behavior of robust estimators in the regression model with sopouaede 
errors.’ Ann. Statist., 5, 681—699. 


2.8. References ~ 207 


ee ee Oe ee ee ee ee E 


Kraft, C., and van Eeden, C. (1972a). ‘Asymptotic efficiencies of quick methods of 
computing efficient estimates based on ranks.’ J. Amer. Statist. Assoc., 67, 199 —202. 

Kraft, C., and van Eeden, C. (1972b). ‘Linearized rank estimates and pine rank esti- 
mates for the general linear hypothesis.’ Ann. Math. Statist., 48, 42—57. 

Krasker, W. S., and Welsch, R. EH. (1982). ‘Efficient bounded- influence regression 
estimation.’ J. Amer. Statist. Assoc., 77, 595—604. 

Launer, R. L., and Wilkinson, G. N. (Eds.) (1979). Robustness in Statistics. Academic 
Press, New Work 

Lehmann, H. L. (1959). Testing Statistical Hypotheses. John Wiley, New York. 

Lehmann, HE. L. (1963). ‘Nonparametric confidence intervals for a shift parameter.’ 
Ann. Math. Statist., 34, 1507—1512. 

Lehmann, E. L. (1966). ‘Some concepts of dependence.’ Ann. Math. Statist., 87, 1137 to 
1153. 

Lehmann, E. L. (1975). Nonparametrics: Statistical Methods Based on Ranks. Holden- 
Day, San Francisco. 

Lehmann, HE. L. (1983). Theory of Point Estimation. John Wiley, New York. 

Maronna, R. A. (1976). ‘Robust M-estimates of multivariate location and scatter.’ 
Ann. Statist., 4., 51—67. 

Maronna, R. A., Bustos, O., and Yohai, V. (1979). ‘Bias- and efficiency robustness of 
general M-estimators for regression with random carriers.’ In: Smoothing Techniques 
for Curve Estimation (7. Gasser and M. Rosenblatt, Eds.),-91—116. Lecture Notes in 
Math. No 757, Springer-Verlag, New York. 

Millar, W. S. (1981). ‘Robust estimation via minimum distance methods.’ Z. Wahrsch. 
verw. Geb., 55, 73—89. 

Nowak, H., end Zentgraf, R. (Eds.) (1980). ‘Robuste Verfahren.’ Medizinische Infor- 
matik 5) Statisiik No 20. Hee rons New York. 

Portnoy, S. L. (1977). ‘Robust estimation in dependent situations.’ Ann. Statist., 5, 
22—43. 

Prakasa Rao, B. L. 8. (1981). ‘Asymptotic behavior of M-estimators for the linear model 
with dependent errors.’ Bull. Inst. Math. Acad. Sinica, 9, 367—375. 

Relles, D. A. (1968). ‘Robust regression by modified least squares.’ PhD Thesis, New 
York. 

Relles, D. A., and Rogers, W. H. (1977). ‘Statisticians are fairly robust estimators of 
location.’ J. Amer. Statist. Assoc., 72, 107—111. 

Rey, W. J. J., (1978). Rodust statistical methods. Lecture Notes in Math. No 690. Sprin- 
ger-Verlag, New York. 

Rieder, H. (1980). ‘Estimates derived from robust tests.’ Ann. Statist., 8, 106—115. 

Rieder, H. (1981). ‘On local asymptotic minimaxity and admissibility in robust estima- 
mation.’ Ann. Statist., 9, 266—277. 

Rivest, L. P. (1982). ‘Some asymptotic distributions in the location-scale model.’ Ann. 
Inst. Statist. Math., A84, 225—239. 

Rocke, D. M., and Downs, G. W. (1981). ‘Estimating the variance of estimators of loca- 
tion: Influence curve, Jackknife and Bootstrap.’ Comm. Statist., B10, 221—248. 

Rocke, D. M., Downs, G. W., and Rocke, A. J. (1982). ‘Are robust estimators really 
necessary?’ T'echnometrics., 24, 95—101. 

Rousseeuw, P. J. (1981). ‘A new infinitesimal pperonsh to robust estimation.’ Z. Wahrsch. 
verw. Geb., 56, 127—132. 

Rousseeuw, P. J. (1984). ‘Least median of squares regression.’ Res. Rep. No 199, Centre 
for Statistics and Operations Research, VUB Brussels. 

Rousseuw, P. J., and Yohai, V. (1983). ‘Robust regression by means of S-estimates.’ 
Res. Rep. No. 197, Centre for Statistics and Operations Research, VUB Brussels. 

Ruppert, D., and Carroll, R. J. (1980). “Trimmed least squares estimation in the linear 
model.’ J. Amer. Statist. Assoc., 75, 828—838. 


208 Chapter 2. Robust statistical inference in linear models 


Sacks, J., and Ylvisaker, D. (1972). ‘A note on Huber’s robust estimation of a location 
parameter.’ Ann. Math. Statist., 48, 1068—1075. 

Sacks, J., and Ylvisaker, D. (1982). ‘L- and R-estimation and the minimax property.’ 
Ann. Statist., 10, 643—645. 

Sen, P. K. (1980). ‘On nonparametric sequential point estimation of location based on 
general rank order statistics.’ Sankhya, A42, 202—218. 

Sen, P. K. (1981). Sequential Nonparametrics: Invriance Principles and Statistical Infe- 
rence. John Wiley, New York. 

Serfling, R. J. (1980). Approximation Theorems of Mathematical Statistics. John Wiley, 
New York. 

Siegel, A. F. (1982). ‘Robust regression using repeated medians.’ Biometrika, 69, 
242—244. 

Shorack, G. (1969). ‘Asymptotic normality of linear combinations of functions of order 
statistics.’ Ann. Math. Statist., 40, 2041 —2050. 

Shorack, G. (1976). ‘Robust studentization of location estimates. Statistica Nederlandica, 
30, 119—141. 

Stein, C. (1956). ‘Efficient nonparametric testing and estimation.’ In: Proc. Third 
Berkeley Symp., 1, 187—195. 

Stigler, S. M. (1973). ‘Simon Newcomb, Percy Daniell, and the history of robust esti- 
mation 1885—1920.’ J. Amer. Statist. Assoc., 68, 872—879. 

Stigler, S. M. (1974). ‘Linear functions of order statistics with smooth weight functions.’ 
Ann. Statist., 2, 676—693. 

Stone, C. J. (1975). ‘Adaptive maximum likelihood estimators of a location parameter.’ 
Ann. Statist., 3, 267—284. 

Takeuchi, K. (1971). ‘A uniformly asymptotically efficient estimator of a location para- 
meter.’ J. Amer. Statist. Assoc., 66, 292—301. 

Torgenson, H. N. (1971). ‘A counterexample on translation invariant estimators.’ Ann. 
Math. Statist., 42, 1450—1451. 

Tukey, J. W. (1962). ‘The future of data analysis.’ Ann. Math. Statist., 38, 1—67. 

Tukey, J. W. (1977). Exploratory Data Analysis. Addison-Wesley, Reading, Mass. 

Van Eeden, C. (1970). ‘Efficiency-robust estimation of location.’ Ann. Math. Statist., 41, 
172—181. 

Van Eeden, C. (1972). ‘Analogue, for signed-rank statistics, of Juretkovds asymptotic 
linearity theorem for rank statistics.’ Ann. Math. Statist., 48, 791—802. 

Van Eeden, C. (1983). ‘On the relation between L-estimators and M-estimators and their 
asymptotic efficiency relative to the Cramér-Rao lower bound.’ Ann. Statist., 11, 
674—690. 

Zacks, S. (1971). The Theory of Statistical Inference. John Wiley, New York. 


Chapter 3 


Models with errors-in-variables 


In the previous chapters we considered linear and nonlinear regression models 
in detail. For illustrative purposes we give a simple linear regression model: 


Yi =a + Bu, + &, ¢ sede Tn 


We always assumed the regressors x; to be observed without errors, which is 
the case for important practical applications, indeed. Think of the models of 
variance analysis, in which the x; are so-called 1-0 quantities, depending on 
the presence of an influence factor. In econometrics, too, we often have the 
hope that the errors €; in the equations exceed the errors of measurements. 
But, after a more careful examination of many systems in practice this can 
no longer be assumed and, from the errors-in-equations model, there arises 
a so-called errors-in-variables model. Special errors-in-variables models are 
known in the literature by the names functional relation(ship), structural 
relation(ship), or even functional model (cf. Malinvaud, 1966, p. 378). We can 
also find some older notation. Furthermore, there are close relationships with 
linear simultaneous equations. 

That is why such models should be used more often with respect to the 
regression models, and all the more so because the uncritical application 
of estimators that are otherwise used in regression models may lead to con- 
siderable misinterpretations in the described functional models (see Section 
3.1.2). 

One reason why these models are still not very widely used at present may 
surely be that the inference in such models becomes essentially more difficult. 
Firstly, this is due to totally different identification problems. Secondly, the 
least squares estimator is still relatively simple to compute for linear regression 
models, whereas there arise eigenvalue problems for linear errors-in-variables 
models. In more general models we do not obtain explicit formulas. 

Within the last few years the rapid development of computers has made 
possible essential progress in the application of such models. In turn, the 
increasing applications have also stimulated the theoretical investigations. 
Corresponding to this rapid development there have been good introductions 
into this problem which contain a survey mainly over linear functional re- 


14 Nonlinear Regression 


210 Chapter 3. Models with errors-in-variables 


lations (Madansky, 1959; Kendall and Stuart, 1961, ch. 29; Malinvaud, 1966, 
ch. 10; Schénfeld, 1971, ch: 11; Schneeweiss, 1971, ch. 7; Zellner, 1971, ch. 5; 
Johnston, 1963; Sprent, 1969, chs. 3, 6; Moran, 1971). Practical problems with 
nonlinear errors-in~ DS models are extensively preaten inf Bard (1974). 
treatment of ak Hane mainly in the fields of asymptotics, distribution- 
approximation, optimality, Bayes theory, nonlinear models, numerics. On 
the one hand, this chapter aims at giving a reasonable introduction into the 
problem based on the mentioned works. On the other hand, it is necessary to 
take into account a series of new results when comprehensively representing 
. these known results and to aim at a relatively closed and comprehensive re- 
presentation of the present level in the statistical inference to errors-in-variables 
models. 

We give a short survey of the contents of the chapter (see Table 3.1.1 and 
the table of contents). Simple examples from practical applications are dis- 
cussed in Section 321.1. In models with errors-in-variables the application of 
the ordinary least squares estimator known from the regression model may 
lead to considerable errors, which is shown in Section 3.1.2. General errors- 
in-variables models are described in Section 3.1.3. A survey concerning iden- 
tifiability problems is contained in Section 3.1.4. Because of their fundamental 
importance, the proofs of the theorems by Rezersol (1950) are also given. The 
statement of Section 3.1.5, too, is relatively little known. If the structural 
parameter is not identifiable in a model with a random experimental design, 
then it is not consistently estimable in the corresponding model with nonran- 
dom experimental design. In Section 3.1.6 we give a survey over the almost 
300 works in the field of models with errors-in-variables in order to make the 
orientation easier. 

Section 3.2 contains results on maximum likelihood estimators. Two- 
dimensional linear functional relations are considered in Section 3:2.1. The 
works by Cox (1976) and Dolby (1976a) made it possible to treat models with 
nonrandom experimental design as well as those with normally distributed 
random experimental design in a unified approach. Following this we consider 
multivariate models with nonrandom experimental design only, with one 
exception each in Section 3.3.2 and 3.5.4. First the connection between the 
maximum likelihood and least squares estimators (MLE and LSE, respectively) 
is described in Section 3.2.2. In Section 3.2.3 multivariate linear models with 
known error covariance are investigated. The MLE with independent errors 
of measurement is obtained from an eigenvalue problem. Besides this well- 
known result, equivariance and certain uniqueness properties are shown (ac- 
cording to Héschel, 1978a). For linear models with a covariance that is known 
except for a factor, some known results achieved by the coordinate-free re- 
presentation are summarized in Section 3.2.4. Section 3.2.5 contains the 
fundamental theorem by Anderson (1951a) about the MLE with unknown 
covariance for normally distributed independent observation errors which has 


Chapter 3. Models with errors-in-variables 211 


been overlooked for a long time. In the same way as with known covariance, 
this MLE may be obtained from certain eigenvectors. For nonlinear models 
we describe, in Sections 3.2.6, and 3.2.7, possibilities for simplification, which 
result from special suppositions about the covariance of the errors. Identifia- 
bility properties of estimators are also given. 

The ‘standard asymptotics’ for MLE and WLSE in nonlinear models with 
a fixed experimental design are considered in Section 3.5.1. There result con- 
sistency, asymptotic normality, and optimality as well as a simple derivation 
of some formulae for the asymptotic covariance. But, for a weakly increasing 
experimental design, too, we can obtain the asymptotic normality. In Section 
3.5.5 this is demonstrated by means of a modified Gauss-Newton iteration 
proposed by Fuller and Wolter (1977). The estimation is described earlier in 
Section 3.3.3. The latter is done within the framework of representing alter- 
natives to the MLE in Section 3.3. Section 3.3.1 considers linear two-dimen- 
sional models, especially instrumental-variables estimators for simple examples. 
The relations between known estimators in linear functional relations and 
simultaneous equations are investigated in detail. Starting from Anderson 
(1976) and the following discussion there we describe, on the one hand, the 
relations between MLE and OLSE in functional relations and of limited- 
information MLE and two-stage LSE in simultaneous equations on the other 
hand. 

An approximate comparison of accuracy of these estimators is contained 
in the summary of the results by Anderson (1974) in Section 3.5.2. The modified 
MLE and two-stage LSE investigated by Fuller (1977) follow in a natural way 
in Section 3.3.1. The results of the approximate comparison of accuracy are 
given in Section 3.5.3. Up to that point only models with non random experi- 
mental design and independent errors are treated. For linear models in which 
errors of measurement and design-points are generated by certain time series, 
Robinson (1977) could construct an asymptotically normally distributed esti- 
mator, which is described in Section 3.3:2 and for which consistency and some 
identifiability problems are investigated in Section 3.5.4; 

An independent extensive section is devoted to asymptotic investigations 
in linear errors-in-variables models, where great value is set on a unified re- 
presentation for the various models. Section 3.3.4 provides an introduction 
into the general model taken as a basis here. The object of the investigation is a 
model with nonrandom, nonobservable variables; for this purpose we first 
explain the parametrization and compile results on the maximum likelihood 
estimator. From the relation of models with random and nonrandom non- 
observable variables there results a ‘canonical’ estimation method by means of 
instrumental variables in the latter case, as a formal special case of which we 
get the maximum likelihood estimator. Within this framework we then develop 
the asymptotic theory in Section 3.4. Following the introduction of the asymp- 
totic model in Section 3.4.1, the consistency is considered in Section 3.4.2 and 
an interpretation of the necessary assumption is given. In Section 3.4.3 some 


14* 


Chapter 3. Models with errors-in-variables 


212 


ee a a ee a 


oVelIes 
aslo aul 
IO} [euoserp x 
WOT} C104] » wopuer yjeruiou 
Giese -U0JM9 NT uUMOUY -uou -Ajod 
eee ee ee ee pe et Oe) ee ee we SN SG ge AS 
OT TE= eT Ss © (8¢°9°e)—(¢z-¢-¢) quepuedepur 
oS a Rent ee Sh a aa ae ae er 
a a ReGrGas, ASTIM [ereues 
eae eee le a Ee OR a 
"E"g°e “Poe quepuedepur 
eee ee et ee ee 
(6°9°§) ‘('9"g) CBSE (AITO =) TIW jewi0u uMoUyUN 
a ee ee 
UOTY e104] 
Sos as *E°R°e “Erere U0J MON -SsneVy 
SS is eee Se ee oe ges a 
(ZET'S'€) — (6313) quopuedepur 
jeuliou 
NE ae ae Se G4 AME ESE aASTO Aypetoedsa 


wopued Teouly 


‘CTGE = VBE WIW [vioues uMOUY -uou -u0u 
SS ai ge epg RP Re Sete ep ae cA Re Lt 
uoryng 90ue 810} 0UL 
einpeooid -LI4stp -II@A00 -ered 
X11} CUL (e[NULIOF Fo ‘ou SP[OF 
*dsoi Wory008) -TUeUL 
sorzoyd Ase UOo1yeUOJUT oe[NUIIO; U0I}BWITYS9 SUOTZVAIOSYO [eyuepToUL -04R48 


a i ee 
Tre 9198L 


} 


3.1. Models with errors-in-variables 


213 


ee ee neater, ee ge 


4sv1}U00 quepuedep 
HSE “eee TUNUUTUL — » WoOpuRI 
a es ee eS ee 
VS ‘SSS HAL 
peodee ss St ee a ye Se eS 
(T¢°€°€) — (83'E"€) pertpou 
qguopuedoeput OyeLIvVa 
(LZ'€"g) — (08'S) STS¢/TWIT [eroues “NU 
ete ee i eel ee a 
jeuoserp 
“SCE -YO0[q 
es Ss Sei Na ae ee x 
uMouyUn 
VCE ATW jewuou 10} 
2 ae ei I ee ee ea 
wopurs 
"ESS ATO [eros uUMOUY -uou = LR9UT|] 
(zre"g)—(L'€"8) GAIT 
Seen eee ee eT 
‘SOE (0¢"¢"€) — (82'€'8) polprpout 
Ree oe ey te eS ee 
BSE (LZ'¢'€) — (08'€'€) STS@/TNIT 
ee en CO ae 
: Ayyetoodse yeuiou 
Tee *[RIOAOS -uou uMoUyUN 
rena apenas eta le eee 
quepuodoput WOpURI oFeIIVA 
(op'g"¢) —(8¢"9") ‘OT S'S aASTO jeutrou -uou 1q 
ee ee ee ee, Lm, sO 
qyuopuodeput a 


ree geq “Jo ATW jewuiou  4uolojzrp wopuer  JBoUly 


214 Chapter 3. Models with errors-in-variables 


special cases from the literature are discussed. Then the question of the effi- 
ciency of the MLE (more generally, that of the defined canonical instrumental 
variable estimation) is at the centre of our considerations. The starting point 
here is the fact, that in the model the case of infinitely many unknown inci- 
dental parameters (hence of an infinite-dimensional parameter space) arises, in 
which the classical theory of the asymptotic efficiency is not applicable. A 
model-specified solution of the problem is given here by considering a certain 
heuristically motivated class of estimators (asymptotic Q,,-estimators). 
Roughly speaking, this class consists of estimators which are functions of the 
second sample-moments; with this, the most important alternatives to the 
MLE known in the literature are covered. The asymptotic efficiency of the 
MLE can be proved by means of a corresponding optimality theory; this is 
done by the covariance matrix of the normal limit distribution, where nor- 
mality is assumed. Furthermore, a simple efficient estimation results from an 
improvement method (Sections 3.4.4 and 3.4.5). Some well-known results 
about the limit distribution and the comparison are represented as special cases 
or variants. Section 3.4.6 shows some results for the nonnormal case and Sec- 
tion 3.4.7 contains supplementary remarks in connection with the basic pro- 
blem of the model. 

The checking of the hypotheses and estimates of the regions is treated for 
linear models in Section 3.6 and 3.7. The relatively short representation is 
based on the results of Anderson (1951a). 

Finally we describe some possibilities of the numerical computation of 
WLSE. The often applied two-dimensional linear models with different but 
known covariances are considered first. In Section 3.8.1 a method of William- 
son (1968) to compute the GLSE is given. In general nonlinear models the 
dimension of the unknown parameter that is to be estimated becomes very 
great as compared to the regression models. But the special structure of 
errors-in-variables models permits a practicable transformation of the known 
iteration methods even for relatively large size of the experimental design. 
Based on work by O'Neill, Sinclair, and Smith (1969). a Newton-Raphson- 
type method is described for two-dimensional polynomial models in Section 
3.8.2. In Section 3.8.3 we consider the methods for general models with errors- 
in-variables. In particular, we discuss the special structure of the Gauss-New- 
ton method in these models. 


3.1 Models with errors-in-variables 


3.1.1 Funetional and Structural relations — an introduction 


When we investigate a concrete system we first establish a deterministic model 
as a set of structures (cf. Bunke and Bunke, 1986, Section 1.3). In practical 
problems such models are mostly described as functional models. They reflec 


3.1. Models with errors-in-variables ' 215 


the knowledge or the assumptions about the mathematical relation between 
the investigated variables, in the first stage mostly in the form of equations. 
In practice, these variables can only be observed with an evidently random 
error. This leads to the corresponding observation model. 


Example 3.1.1 The relation between mass m, volume v, and density d is 
m = dv. We want to ‘determine’, or more precisely, to estimate, the density 
d on the basis of several measurements of m and d. This example was discussed 
in detail in Madansky (1959). 


Example 3.1.2 For uniformly accelerated motion we have, for the starting 
velocity vo, the time t, the acceleration 6, and the covered distance s, 

S = Uol + bt7/2. 
For example, we are interested in a confidence region for vp and b if measure- 


ments of the two other quantities are available. 


Example 3.1.3 With the physical pendulum we have, for the duration of the 
oscillation 7’, mass m, the moment of inertia MW, and the distance a between 
the oscillating pendulum rod and the centre of gravity, 


T? = 4n°M/mga, 


where g is the acceleration of gravity. 
Already from the simple linear functional relations, important fundamental 
questions can be discussed. Let the basic deterministic model for the variables 


&, 7 be linear: 


y= a + Bé. (1) 


Now we observe ” points mu; = [&;, 7;],7 = 1, .-.., n, satisfying this model. But, 
they can only be observed with a random observation error §;. Thus, 


= mit Si (2) 
is observed, or written componentwise, for 2; := [#;, yi] 
x, = &+d;, 
(3) 
Yi=ni + &- 


In case J; = 0, we obtain a bivariate linear regression model. The vector 
fh := In) = (Mi)r,....n Of the observed test points is consequently ‘a sort of? 
an experimental design. But this one is unknown and, contrary to the regres- 
sion models, it can not be planned directly. The errors often are assumed to 


be independent: 


DeEGi=1.e2, x € Mz. 


216 Chapter 3. Models with errors-in-variables 


We can also simply write repeated observations of the same experimental design: 
Zip = Wit Size k= 1,...,m;, (4) 
where m; denotes the number of replications in the ith design point. 


Example 3.1.4 (cf. Barnett, 1970) In a medical experiment the relation 
between the protein level 7 of the urine and the applied drug dose & was in- 
vestigated and repeated measurements were taken. The relation between & 
and 7 is supposed to be linear. Now the é; and hence the yu; can very well be 
realizations of random quantities. 


Example 3.1.5 Possibly the dose é; obtained in Example 3.1.4 has to be mo- 
delled as a random quantity with a distribution of its own. Random variations 
of the dosage arise, e.g. by the filling of the syringe. We proceed in the same 
way if we obtain the specific dosages from a certain chemical process where an 
exact measurement of the dose is not possible. 


Now, if we make special suppositions about the distribution of tq), i.e. 
the common distribution of the u;, i = 1, ..., n, it is sometimes even possible 
to investigate the distribution parameters of ((,). 

A simple model of this kind is 


ni = o-+ B&;, i= 15.5, 2; 
. (5) 
a= Mit Si M2= [Se nil, 
with the distribution model 
§.@ 7, = IP, |0 ¢ OF, 
i © {(N(0, 2) | 2e I}, 


where the §; and the ¢; are supposed to be stochastically independent. 

This model was thoroughly discussed in the literature. The terminology 
‘linear structural relation’ was adopted. It was of greater importance because 
there arose some fundamental difficulties with the identifiability in the simple 
model, which had to be discussed in detail. On the other hand, only this simple 
model had been accessible to numerical computations for a long time. Namely, 
only for the case that the P, are normal distributions we can secure the iden- 
tifiability of « and 6 under certain restrictions to /’ and we can obtain, besides 
elementary computable estimates for « and f, those for X and #, too. But also 
other models with a random experimental design with the usual assumptions 
on normal distribution could be investigated, at least asymptotically; e.g. 
the following model with replications of observations (cf. Cox, 1976; Dolby, 
1976a; Patefield, 1977a; Brown, 1978a; Chan and Mak, 1979a, b): 


Nig = % + BE, §i; © (N(4;, o;) | 3; € IR, o; € R=}, 
aij — 14 te Cig, Si) i) {NV (0, 2’) | = diag (o1, 02), 0; = 0}, 


OS =i US cease Onn jal Sig OR 2 


(6) 


(7) 


3.1. Models with errors-in-variables Zu7 


With further restrictions on the parameter region this model also yields the 


other models mentioned so far. But it is obvious that the same functional 
model 


ni =a + BE, 4m + oj (8) 
may be completed by different distribution models. 
From the supposition of nonrandom y; to the different distribution assump- 
tions for {,), @ great number of practically important linear models can be 


imagined. This is also the case for simple bivariate nonlinear models. Let us 
take for instance the simple quadratic model 


ni = % + Bids + BG, a= Mito (9) 
without distribution assumptions. 

Finally, all essential differences between such models result from the problem 
whether with the increasing number of observations, the number of the points 
Mi, t= 1,...,n, which are usually denoted as incidental parameters, also 
increases. For all these models we use the term ‘functional relation’ and — if 
necessary — we distinguish between those with a random or a nonrandom 
experimental design. 


3.1.2 Comparison with regression models 


The regression model is a special case of a linear functional relation if the 
independent variables may be observed without errors. In the following we will 
demonstrate — by means of the bivariate linear functional relation (LIFU) 
from Section 3.1.1 — the following facts: 


1. In a LIFU we do not have to distinguish ‘regressors’ and ‘regressands’, 
nor ‘exogenous’ and ‘endogenous’ variables. 

2. A LIFU may be formally written as a linear regression model with stocha- 
stic regressors, but in such a ‘regression model’ the usual supposition on the 
independence of equation errors and regressors is not fulfilled. 

3. This is the reason why the uncritically applied least squares method fails 
here and provides inconsistent estimations. 


In (1) the LIFU was given in the form 7; = « + f&;. Then &; need not be an 
exogenous variable, as Examples 3.1.1—3.1.3 show. Essential in these exam- 
ples is that the vectors [€;, 7;] lie on a straight line in the plane. But, the special 
parametrization (1) does not imply the straight line 0 = é; if we leave out of 
account a possible inclusion of this special case in the form 8 = oo. Thus, in 
many cases we will not have prescribed exogeneous variables in an implicit 
representation of the form 


O= 1+ Bobi + Bini (10) 
and we will not be able to mark any of the two variables as being exogeneous 
or independent. 


218 Chapter 3. Models with errors-in-variables 


Further, from (1) and (2) we obtain the representation 
Yi =x + Ba; + (8 — Bd). (11) 


Since the observations of 2; = [a;, y;] are known, we have with (11) a re- 
gression model in which the ‘regressands’ y; and the ‘regressors’ 2; were obser- 
ved. The equation error ¢; — fd; has a vanishing expectation as in the re- 
gression model. But 


Cov (a;, €; — Bd;) = E(E; + dj, &; — Bd;) = —Bo; (12) 


holds, a statement which obviously is true for the LIFU (6), too. Hence, the 
covariance between the regressors and equation errors vanishes only for o, = 0 
if we do not take into account the special case 8 = 0. 

We know that in a regression model with random regressor (cf. Bunke and 
Bunke, 1986, example 2.1.1) the best linear unbiased estimator results for 
each realization x(n) of ®,) from the OLSE with the fixed regressors 2) 
(see e.g. Rao, 1973, 4.a.11). Thereby we assume the independence of the 
regressors #; and the errors &;, of course. The obtained estimator is consistent. 
If the regressors and errors are correlated as in the present. case, this is no 
longer true. In this case the OLSE is 


B= Lei — #) (yi — GS (ei — 8), | 
: (13) 
a:= 9. — Bz. 
The statement of this example is not diminished if we assume the errors 6; 
and &;, respectively, to be independently identically distributed and if d; is 
independent of €;. We write o; := D(d;), o, = D(é;). 
Furthermore, let 


lim & =:¢ < co 
lim ss &,0;,/n a= |bhen yd) E;e;/n = 0, (14) 
lim ¥ (8; —£)¢[n =:0; <0, 


hold for the experimental design p(n). By a simple application of the law of 
large numbers we show that 


B+ lim 4 (é, + 6; —F, —3) 


noo t 


x (BE; + &; — BE, — é.)/> (é; + 6; —€, — 6, 


= Bo;|(o¢ + 95), (15) 
& > x + Beos/(o5 + Or). (16) 


Thus, the OLSE is inconsistent for o; + 0. 


3.1. Models with errors-in-variables 219 


Under more general assumptions the inconsistency can be proved for multi- 
variate LIFUs by mans of the martingale theory (cf. Robinson, 1977, lemma 1; 
see also Section 3.5.4 and [A 3.2]). Some practical investigations from econo- 
metrics indicate that the detection of this inconsistency is of fundamental 
importance for the correct interpretation of data (see Malinvaud, 1966, p. 380). 
Having investigated the three points of comparison mentioned at the beginning, 
the regression models and LIFU shall now be compared with respect to pre- 
diction. In doing this we follow the thoughts of Malinvaud (1966, pp. 382 
to 383): For this purpose it suffices to consider the case « = 0 in simple LIFUs. 

The problem of prediction consists in determining a value y*, ,(2,)) a8 a 
prediction fo Ynsi OF ny, depending on the observation z,). Thereby let z,,) and 


En+y OF Ent1 be given. Usually E((yn+s(Ym) Am Tins1)?) and Biz Zn 41(%n)) ee Ynsr)? ’ 
respectively, are minimized as a risk. 


1. Functional models of the described kind are often used to predict the 
corresponding value of 7,,, when the ‘true’ value &,,, is known. For instance, 
we want to predict the mass 7 on the basis of the estimated density 8 when 
the volume é takes a certain value. Or, we would like to predict the increase 
of the total social consumption that is to be expected when incomes reach a 
certain amount. Let f(z) be an arbitrary estimator of 8. Then y%,, = = BE 
is the prediction and the risk is 


EY +1 — Nasi)? 
a & E(B er B)? 


To minimize the risk, the mean square deviation of 8 and f has to be minimi- 
zed. But, because of the inconsistency of the regression OLSE we may obtain 
an improvement at least for great observation numbers by using a consistent 
estimator. 

2. In many practical cases LIFUs are used to predict y,,, when knowing the 
observed value z,,,; and not the value &,;, that is assumed to be known. For 
instance, we want to predict the weight of a workpiece which is not contained 
in the sample on the basis of the measured volume, or, we want to predict the 
consumption of a certain product for a household basing on the measured 
income. 


(17) 


Now, let 2,,; have a distribution that is determined by p,,, and ¢,,, and 
that is the same as that of z;,7 = 1, ..., n. If the regression function — i.e. the 
conditional expectation — of y,,; over #1; is linear, then the regression OLSE 
indeed provides a satisfactory estimation f of f, which is suitable to predict 
Yns, for observed “,,,. However, for the LIFU the linearity of the relation 

= fé is not necessarily transformed to the regression function of y,,, over 
n13- This is the case only under special assumptions (cf. Lindley, 1947; Kendall 
and Stuart, 1961, ch. 29, pp. 56—59). For this general case (and also if Uny 
has not the same distribution as the y;, 7 = 1,...,) it is obvious, to take, 


220 Chapter 3. Models with errors-in-variables 


as in 1, y*., = B%n,, as the prediction because other prediction functions are 
more complicated. The error of the prediction is 


Aes ae — Yau = Ca(B — $) +: (bniiB — €nu1)- (18) 
With given 2,,, it follows for the risk that 


E(Az ,,) = v2, E(B — B)? + 2B(2%n(B — B) OniiB — &n+1)) 
+ E((On118 — €ns1)°) (19) 


With the stochastic independence of §,., and 2») the second term vanishes 
and 


E( 441) = wn 41H(B — BP + BOB + bn) (20) 
results. 

Hence we can obtain an improvement compared with the inconsistent re- 
gression OLSE in this case, too, by giving a consistent estimation, at least for 
large samples. Finally, the uncritical application of regression methods causes 
misinterpretations also in this second problem of prediction. 


3.1.3 Models with errors-in-variables 


3.1.3.1 The fundamental model 


In Bunke and Bunke (1986, sections 1.2 and 1.3), basic definitions for the 
investigation of statistical problems of the formation of models were given. 
The definitions — directed to the application in regression models — con- 
tained the regressands in explicit dependence from the very beginning. Typical 


was the example 1.2.5 with the system equation 


Yi= D(b, Xj, €;, Ui). 


There y; was the output vector, x; the input vector, w; and ¢; were the control 
variables and error variables, respectively, and b was the system parameter. 
There are already simple deterministic models in which such an explicit 
description of the system is no longer suitable. 


Example 3.1.6 The movement of a celestial body round the sun in a plane 
is described by a quadratic equation, i.e. for linear local coordinates vo and 
v® we have 


0 = 2 + QAM + AYA + 7@yB* + IMyVy 
+ xy" = so(v, 2) 


with a := [x, ..., 7] and v : = [v®, v®]. 


3.1. Models with errors-in-variables 221 


The solution of such an implicit system equation with respect to one variable 
— explained as being independent or exogenous — will not be possible in 
general. Additionally, the preference of one quantity as exogenous often does 
not make any sense. For instance, the two local coordinates are totally equi- 
valent from the practical point of view. 


Definition 3.1.1 (Deterministic functional model) Let v € IR® be a vector of 
system-describing variables. The model is given by a parameter-dependent equa- 
tion 

0 = s(v, 2), 8 : IRet4s > BR’, ¢ <d,. (21) 
a € I. — R* is called the system or structure parameter. 8 is said to be the system 


or state function. The equation (21) is called the system or state equation. As state 
manifold or structure for the parameter x we denote the set 


St, = {v7'| 0 = s6(v,/x)}: (22) 
The set of state manifolds admitted in the model 
Stq := (Stabnen (23) 


is called a structural bundle. 
The set JJ of system parameters admitted in the model may possibly be 
described by an implicit equation: 
IT = {xn € R4|0 = p(a)}, p : Ri +R” (24) 
Continuing Example 3.1.6, the following example shall demonstrate the prac- 
tical importance of nontrivial restrictions of the kind 0 = p(z). 


Example 3.1.7 Among all orbits of celestial bodies let us only consider para- 
bolas. These are generated by the following restrictions of the parameters 
Geet ee... se) | 2 


gr) gr64) 
0 = det ( ; 
a» gp (3) 
(In fact, here are still included pairs of parallel or imaginary straight lines). 


Already with simple examples of applications, not only implicit equations 
to describe the model may be suitable, but more general sets x and St, may 
also occur. 


Example 3.1.8 Again we consider the Kepler model for the movement of the 
planets around the sun. The orbits of the planets are ellipses where one focus 
lies in the sun. The related model is covered by a certain subset of quadrics, 


222 Chapter 3. Models with errors-in-variables 


let us say of the form 
0 = s(v, 2), wed, 


where z € IR® describes the set of all ellipses from the Kepler model. 
However, ellipses are just not described by equations of the form p(z) = 0, 
but by inequalities, namely: 


fp. S\(Op A.A, <0, 


gt) 64) a) \ 
AG = det a») (3) erate) 
aD yf?) 6) 


(9) (4) 
AAs = det ’ A; = ) oh qe) 
(4) 3) 


Further restrictions result from the demand that the sun lies in one focus. 


In the general definition of a deterministic functional model we abstract 
from the parametric explicit description from Definition 3.1.1 (cf. Figure 3.1.1). 
The use of the general notions permits a simple and illustrative description of 
estimation methods like the least squares estimator. 


Definition 3.1.2 As a deterministic functional modet for a cause-effect relation 
between the components of a vector v € IR® of system-describing variables we denote 
an arbilrary set Sty of structures 


St; = (Stet7,  StecR*®, WOR. 


The index x € IT ts called the structural parameter. 


Sty 


Fig. 3.1.1. 


After having defined deterministic functional models and having thereby 
shown that also implicit and more general models may be of practical impor- 
tance, we want to introduce stochastic functional models. They result from 
deterministic models if the related observational model includes random vari- 


3.1. Models with errors-in-variables 223 


ables coming from random errors of the observations or from the approximation 
of the unknown structure. The vectors v can be distributed on the structure St, 
randomly. 

Functional relations are the first class of models we consider. They include 
nm fixed points mw; on the structure St:y; € St,, 7 = 1,...,, the so-called ex- 
perimental design. But the observation 2; of yu; can only be oy eined with an 
additive random error ¢;. 


Definition 3.1.3 A functional relation with nonrandom experimental design is 
given by. 


= w+, ‘= Le ees 4, € RY 
0= So(Mis I) 5 So(-, I) : RY > HIRO, (25) 
O= pln), pl) : RY > Re, 


The yu; are called incidental parameters or design points. The €; are realizations 
of random measurement errors with a distribution model 


Siny = (Si)s en OSE NP. || yy ET} 5 


In special applications we do not always have to have a ‘direct’ observation 
error ¢; with 2; = uw; + §;, as the following example will illustrate. 


Example 3.1.9 As the orbits of the planets are not directly observable on the 
apparent celestial globe, we obtain the following situation (cf. Figure 3.1.2). 
In the state space the cause-effect relation is described by a deterministic 
functional model 


Sty = {St,} 7, 


Furthermore we have a transformation A of the ‘variable space’ into the 
‘observation space’, which is the apparent celestial globe in this case. For 
simplicity we consider A to be parameter- and time-independent. Then the 
points u;,7 = 1, 2,..., n, on the fixed structure St, can no longer be observed, 
but only the transformed points A(u;) with 


2=Alwi)+$i, t=1,...,0. 


This example also shows the sequential character of the practical modelling 
very well. In the beginning Kepler was not able to bring the exact observations 
of planets by Brahe into a good correspondence with the Copernican model: 
St, = {eccentric rotary motion}. In 1609 after he had adjusted the obser- 
vations to (estimated) orbits St,,,, he found the laws named after him. With 
this the Copernican heliocentric system had got rid of its shortcomings. But, 
with increased accuracy of observations also this model does not suffice. The 


224 Chapter 3. Models with errors-in-variables 


long-known rotation of the perihelion of Mercury could be explained sufficiently 
exactly only by a model from Einstein’s general theory of relativity. 


Fig. 3.1.2. 


A slightly more compact way of writing implicit models is 
Zn) = Mn) + S(ny> Mn) € IR" 


0= S(ny(M(n)> Z)» $n) = [Siht,...,n 


with a distribution model for the observation error: 
Gin) © Pew = LPs | Ws de 


As in this case the regressors are also nonstochastic, these models are called 
models with nonrandom experimental design or models with nonrandom in- 
cidental parameters. 

In the literature restrictions have been given to the mean and covariance for 
the distributions. In most cases H5 = 0 is assumed, and for various investi- 
gations also the covariance is assumed to be known, as for example in Britt 
and Luecke (1973). 


3.1. Models with errors-in-variables 225 
a ee a ee 


We will mostly treat explicit models in the following. Many implicit models 
in their numerical treatment have to be reduced to explicit ones. The follow- 
ing unified notation shall be used for explicit models. 


Definition 3.1.4 An explicit functional relation is given by the equation 
U= ro(Si a)), hz [S45 nil, 
“=F + 6;, v= + 1, 1=1,...,n, 


where 2, = [x;, yi] € IR® are realizations of the random observations z;. The 
distribution of z;, is determined by the distribution of the observation error 6; 
= [d;, &]. We have d, = d,. The é; are called incidental parameters. 


Mostly we assume the ¢; to be independent and identically distributed: 
$5 © PX = {(P, |y eT}. (27) 


3.1.3.2 Linear functional relations 


Definition 3.1.5 A (homogeneous multivariate) linear functional relation with 
nonrandom experimental design (LIF U*) is given by 


i = Bé,, Be racy: 
a= & + 40;, dim é; = d, —c=:4,, (28) 
Y¥i= Nit &, dim yj = ¢ =: d; 


and a distribution model for the observation error §(,, ~~ P,, y € I’. The vectors 
2; := [xi, yi] are the random observations and the mu; := [&;, ni] yteld the experi- 
mental design. 


In many practical applications the $; are supposed to be stochastically in- 
dependent. But there is an increasing number of works that suppose depen- 
dences between the $;. We often used the representation of the parameter as a 
column vector, which follows Definition 3.1.4, 


Sere (bi), yeeey dn? b; S R%, 


where the 6; are the row vectors of B. 

But 7 = Bé does not include all linear (d, — c)-dimensional subspaces. For 
d, = 2 we have 7 = fé. Consequently we can obtain the 7-axis only as a limit- 
ing case 6 = oo. Fortunately, by this we only exclude a null set of linear spaces 
(cf. Remark 3.2.4 in Section 3.2.5). 

It is obvious and more suitable for theoretical purposes to choose the set 
£, of all r-dimensional subspaces of the R“ as the structural bundle of the 


15 Nonlinear Regression 


226 Chapter 3. Models with errors-in-variables 


model. Correspondingly let 2-, = U &,. Instead of (28) we obtain as a ‘linear 


model’ more generally TS4q 

epee Cc La, —dy or LE ey =a, (29) 
or 

Mn EL", LEX -a,- (30) 
Equivalently 

HRA EC Sil Oe SiGe aes (31) 


Using generating matrices for the subspaces we obtain further possibilities 
of representations by 


me RL), LE Mexia (32) 
or Max (d-e) » d=d,, c=d,, 


or 
Mi = Lj, LeMay Pee Ree (33) 


Here £ € 4, is uniquely determined, but neither LZ nor »; are. 
Equivalent to (29) we get 


1 Oe PNY EGO Knees r=—d, or r2d,(=¢). (34) 
A further possibility is provided by 
(Rj, ---) fa)) = Fy — F,- 


An extension of the model is obtained by 


w= Mn C4", LER saa, (35) 
2A Bes, A Oni cara 


~ with known matrix A (cf. Example 3.1.9). 
This model includes some important special cases. For instance, LIFU+ with 
repeated observations of a fixed experimental design result from 


A=D®ii,, D = Diag (1n,)i, m= m, 
where the ‘design point’ yw; is observed m; times. A further special case is 
A=U'®Iu,, Cee Mice s sn: 


Analogous to the usual regression models, linear errors-invariables models 
with r < n are called singular. For singular LIFU* we have to carry out iden- 


3.1. Models with errors-in-variables pepaiy| 


tifiability considerations analogous to those in Bunke and Bunke, 1986, 2.2.1) 
(cf. Section 3.1.5). 


(35) is closely connected with multivariate linear regression models with 
restrictions to the regressions parameter. Namely, if we write 


M = En) > Z => 2m)» & = S(m) > ft = A(L+), (36) 
then we get the model 


Z=MU+E, ZEMuaxm ete. 
(37) 
j OE eI 6a oh Oe er 


Now, if we understand M € Ni,., as a matrix of regression parameters, then 
the mentioned connection results. Notice that here the number of columns 
of the parameter matrix M € M.,., may ‘increase’ together with m. 


3.1.3.3 Linear functional relations with fixed experimental design 
and with linear regression part 


Extensions of this model are closely related to linear simultaneous equations 
(cf. (40)). As LIFU* with a linear regression part we denote the model 

Z=M,U+M.V+$=:MW+S, W=[UiV], M=(M,!mM,), 

(38) 

Me ane Ga) FAG nx men ly =O Le Me 


T= Ce 
The first time such models were investigated in detail was by Anderson (1951a). 
The relations between such regression models and LIFU had been overlooked 
for a long time so that the results derived for this had to be rediscovered for 


various special LIFU*. 
The way of writing of (38) corresponding to (35) is 


2 = (UO @ 13) Mayr + V © La) bene + Gin) (39) 
fae lL, LCenaees i= 1,...,m, Hj € IR, 7 =1,..., Mp. 


There are close connections between LIFU* with a linear regression part 
and linear simultaneous equations. Let v; denote exogenous variables and let 


0= L1%; + fn; + 0, EEE ENG Gs LE My xa, (40) 


be those equations of a complete linear simultaneous equations model for 
which the coefficients of a second vector of exogeneous variables «; and of 


15* 


228 Chapter 3. Models with errors-in-variables 


endogenous variables z,; are known to be zero (cf. Anderson (1951a, eqn (6.6)). 
The aim is to utilize this prior information to improve the estimates of the 
remaining smaller number of parameters — called therefore ‘limited information 
maximum likelihood’ (cf. Section 3.3.1.5). In the reduced form, however, some 
parameter matrix M, of u; will occur. But since u; does not occur in (40) we 
must have +’ = 0. Therefore the starting point for further investigations is 
a subsystem of the complete reduced form of the linear simultaneous equations 
model: 
a = My; + My, + Si, 
(41) 
AM 5 05 4 Lee Sg - 
(The questions of identifiabily and inference that are related with the original 
model and the use of the reduced form are not further treated here; see Ander- 
son, 1951a, pp. 344—345). But the subsystem of the reduced form (41) yields 
nothing but a LIFU* with a linear regression part according to (38), namely 
WO (Aly gin aes Ug VV (Og we os, Ua) = 
In the econometric literature we often consider explicit linear simultaneous 
equations in which L is parametrized in the following form: 


Lt = (Bi = I); Be Mip-a) xa (42) 


This representation provides the relation between known estimators for linear 
simultaneous equations and LIFU* (cf. Example 3.2.2 in Section 3.2.5.). 


3.1.3.4 General models with errors-in-variables 


Starting from functional relations we obtain a more general class of interesting 
models if we suppose that not all but only some of the variables are observed 
with errors. The state vector v; can be divided into one part ;, which is ob- 
served with errors, and a part w;, which is observed without errors: 


0; = [wW;, wi]- 


The w; can also be denoted as regressors. As in functional relations, it shall 
be possible that the system function for the ith state depends on the index 7, 
too. 

Thus, let 


Stin = {v; € R® | 0 = s;(v;, z)}. (43) 
be the state manifold for the ith state. 
In functional relations we have St;, = Sty, = : = St,, i = 1,..., 2. Simpli- 
fying, we put 

Si(Wi, Mi ™) = Si(Ui, 7), (44) 
if the (exactly known) regressors w; play no role in the current considerations- 


3.1. Models with errors-in-variables 229 
The common state manifold for all n observed states has the product form 
n 
S, = X Stis = {Mn |0 = 8,(v;,2), t= 1,..., n}, (45) 
i=1 


which is typical of errors-in-variables models. 
The state vector v,) and the system parameter vary in the set 


S = {[rqm 2] € Rtas | 5(v49), 2), plac) = 0} (46) 
= U {zt} x ss 


nélI 
This set can be denoted as the system manifold. 

Here s = 8(n) = (8i)1,...n i8 the system function. With fixed w,) and a, the 
unknown incidental parameters in the (w, z)-section of S may vary: 


Diez es {U(n) | 0= Si(Wi, Mis It), ae Tee "9 n} : (47) 
While in functional relations the structural bundle 
Sty = {Sta}, St, = {v € R® | 0= So(V, zt)} 


suffices to characterize the model, we have to use the following structural bundle 
in general models: 


Sr a Sx (48) 


Mostly we consider explicit models. Thereby the state equation for the ith 
state has the form 


Ni = Ti(Wis Fi, 7) (49) 


with 7; € IR°. In this case the variables-vectors w;, &; may be denoted as inde- 
pendent or exogenous. The functions 7; are regression functions in a generalized 
sense. Here, too, we simplify 7;(w;, &;, 7) =: 7ri(&, 2) if the known w; are of no 
importance. We want to emphasize that the variables observed without errors 
can also be covered by means of the distribution model for the observation 
errors. Then we have to permit singular covariance matrices. This is extensively 
developed for linear models in Section 3.4. 


3.1.3.5 Regression models 


In the context of the general errors-in-variables models, regression models 
can be obtained in a natural way. Regression models are such errors-in-variables 
models in which the erroneously observed variables wu; are explicit functions 
of the errorless observed regressors Wj: 


Mi (=i) = Ti(Wi, 7). (50) 


230 Chapter 3. Models with errors-in-variables 


Thus, the system functions s; have the form 
Si(Wi, Mi, T) = —HWi + 7{(W;, 7). (51) 


We also obtain regression models if we start from an explicit functional rela- 
tion and take a special degenerate distribution in the distribution model for 
the observation errors. For this purpose we start from an explicit relation 


ni = Ti(Ei, 2) (52) 


M=sn) a= yil, Ci = [4;, e:]. 


In the distribution model for the observation errors we suppose 6; = 0. Then 
the x; = ¢; are the regressors, and we get the well-known representation 


Yi — (is qt) ar &i, a = ii cee NM. ry (53) 


Conversely, we also can denote the functional relations as regression models 
with errors in the regressors. 

Concluding this section, we want to discuss some differences in application 
of regression models and more general models. Obviously regression models 
can be used to give a prediction of the ‘dependent’ observations y; on the 
basis of the ‘independent’ variables w; and 2;, even if the underlying ‘correct’ 
observation model includes errors in the variables. In such a case one wants 
to know the conditional expectation H(y/x, w) which yields the best prediction 
of y by w and x under quadratic loss (cf. Rao 1973, 4.g.1). For this, regression 
models of the form 


Yi = 7(K, Xi, Wi) = 8; (54) 


are suitable. Then by FF (i,.) an estimation % also yields an estimation of the 
conditional expectation. In contrast to this, the general errors-in-variables 
models serve to model the system behaviour, that is for the prediction of 
‘dependent’ state variables 7; after fixing the ‘independent’ &;. 

This difference can be made clear by an example. For instance, a safety 
inspector supervising the erection of a building might be interested in knowing 
which value attains the true shearing stress 7, given that a surface load x was 
observed. We would apply a predictive regression model. But for an engineer 
designing a building it is more important to know what the true shearing 
stress 7 would be if the true surface load & attians a specific value. He would 
apply an errors-in-variables model. 

Moreover, regression models are suitable if the error of the measurement 
of the ‘independent’ state variable é; is small compared with the error in the 
‘dependent’ state variables. They are also suitable if the equation error is 
great in comparison with the error in the measurement of the independent state 
variables (cf. Section 2.1.6). This is, for instance, the case in variance analysis. 


3.1. Models with errors-in-variables 231 


Here the regressors w; are exactly measurable 0-1 vectors and then é; includes 
the comparatively large model error. For a long time it has been known that 
in a linear system model where the state surfaces are linear subspaces, the 
corresponding regression function for the observations must not be linear 
(cf. Kendall and Stuart, 1961, 29.56—59). 


3.1.3.6 Functional relations with random experimental design 


The models considered so far have been based on a deterministic model of 
the form 0 = s(u;, x), where fixed design points u; € St,, were given. Among 
other things, it is characteristic for such models that the number of the unknown 
parameters y;, 7 = 1,...,, increases with the number of observations. Since 
each distribution of the single observation 2;, only contains the parameter 
;, among all the others u;, 7 = ip, the y; are called incidental parameters (cf. 
Definition 3.1.3). 

A totally different situation arises if the experimental design is randomly 
distributed on St,. This distribution of the w; is often supposed to be independent 
and identical and then the number of the parameters does not increase with the 
observations. But dependences have been admitted recently (cf. Robinson, 
1977). Here it holds for the corresponding sequence of models that the dimen- 
sion of the parameters of interest does not increase, although the dimension 
of the incidental parameter may indeed increase: 


Definition 3.1.6 Explicit functional relations with random incidental parameters 
are defined by 


4=-¢Mi+ o> He — enn] 


(55) 
Ni = To(Sis 7) 
with a distributional model for the stochastic variables 
Win) >= [Sn im] © FP? = (P, | % € K}, (56) 
where the stochastic independence of § ni) and $ i) is mostly demanded : 
Perel Pr (@,7€eO XT: (57) 


Of course this definition can also be applied to implicit models. 
Then we have to demand additionally in the distributional model that 
Px(0 = 8(2, 2, U, $)) = 1 holds for y = [2,u,$]. a relation that is always 
satisfied in the described explicit functional relations. This model can also 
be written in a a more compact form corresponding to Definition 3.1.4: 


2m) = Mn) + (mys Meny = [Fi MDa, «sn» 2m = [Pie Yida,...sn» 
Mn) = Tn) (E(nys B)> 
En © Pi = {PE|0€O}, Sin OP! = {PE ly ET}. (58) 


232 Chapter 3. Models with errors-in-variables 


3.1.4  Identifiability 


Now we turn to fundamental questions of identifiability. Thereby we investi- 
gate only functional relations in which all state variables are observed with 
errors. Corresponding to the possible distributional suppositions, we have to 
distinguish between models with random and those with nonrandom incidental 
parameters. But we have the same state equations; hence one could expect 
that both models have similar properties with respect to identifiability. But 
the different distributional assumptions in the models result in totally diffe- 
rent situations with respect to the distribution of the observation z;,): 

For a random experimental design (4;,, we have 


Pe = pert, (59) 


Here the structural parameter z influences the distribution of the observations 
only indirectly, via the distribution 


Pte= Pee (60) 


But with the exception of some special cases the distribution of 2=u+¢ 
can be rather difficult to obtain. 

We have a completely different situation with a nonrandom experimental 
design “,»). Here 


Pe = Pett — px, (61) 


As u is fixed now, the distribution of 2 = w + ¢ is obtained without difficulty 
from that of § by shifting the mean value. The structural parameter z does 
not directly occur in the parameters of the observational distribution P*”). 
There arise other problems with these models because of the number of un- 
known parameters that increases with the number of observations. 

This problem is impressively demonstrated by the failure of the maximum 
likelihood method, which initiated a great deal of discussion. 

For this purpose we consider the simplest LIFUt-model by (1), (2): 


05 O 


Gree — 
Pt:= N(0,2(y)), = ( : 


) y = [o5, o,] € R> eR= 


é 


with stochastically independent §;. 
Then 


Yr = [x, B; srr 95, oe] (62) 


3.1. Models with errors-in-variables 233 


and the likelihood function is 


tAv,)) == oo, Vo. “exp 


1 1 
ve Be Liles — 4)? — bs i ee pray. (63) 


It can easily be shown (cf. Lindley, 1947) that the relation 


Cen = BeOsas (64) 


follows for the MLE. 

The consistency we know from other statistical models would only result 
if o, = Bo;. This failure of the likelihood method in LIFU was discussed in 
detail in the literature (see Section 3.1.6). Hence we could conclude that the 
application of models with a fixed experimental design is only possible with 
difficulties, and consequently we should use only models with random experi- 
mented design. But the distribution of 2 can be given explicitly only in a few 
cases, among others for normally distributed u; and ¢;, which, indeed, include 
most of the univariate random quantities in practical cases. 

But precisely this case causes fundamental difficulties for LIFU- in identifia- 
bility. The following fundamental theorem goes back to Reiersol (1950a). 


Theorem 3.1.1 Let the simple LIFU- model be given according to (5), (6). 
Then B is identifiable if and only if at least one component of wu; is not normally 
distributed. 

Proof. The characteristic function of ¢; is 


p(t) = exp {—0’2#/2}, t € IR’, (65) 
and for the pu; it holds that 


Pult) = exp (atta) Ye(t, + Ate), (66) 
with 7 as the imaginary unit. For 2; it this follows that 

y(t) = exp (att, — t’Lt/2) p;(t, + Pte). 
Now let 

= (68,8, 21+ 9, 
but P% = P5. With this we would also get %, = $z, ie. 

exp (xtt, — t’Xt/2) y;(t, + Pte) 

= exp (Kit, — t' 24/2) Gel, + Bt). : (67) 
For 6 + B we could find such numbers 4, ¢, that 

ttfe=s, t+ fpr =0 (68) 


234 Chapter 3. Models with errors-in-variables 


holds, in case s € IR? is arbitrarily given. Hence 


t, = —Bs/(6 — 8), t, = s/(6 — B). 
Inserting in (67) it follows that 


y;(8) = exp (ta,8 + aes?) 
with 

a, = (& — «)/(B — B), 

a, := (B, 1) (& — 2) [B, 11/2 (6 — 8). 
Consequently §; is normally distributed or a constant, by a known characteri- 
zation theorem (cf. Kagan, Linnik, and Rao, 1973). This can be shown ana- 
logously for 4;. Hl 
Coorllary 3.1.1 « is always identifiable if B is identifiable. 


Proof. Let p = B and y= P;. Then we have t, = s — ft, in (68). Having 
inserted this in (67) and having taken the logarithm, it results that 


1 
In (v</(s)@z(s)) = (& — «) it, — 9 Ils — te, e]I/? 


eee ie (5, ty) ( : H (2 — £) (; cH Ee ul} (69) 


As this equation holds identically in s and ¢, the coefficients of the monomials 
Sty, tz, (2 of the right-hand side have to vanish. This yields 


CoS 5 (70) 
O5e — Ose = B(o5 a G5), 

(71) 
O, — G, = B(os. — 5.) = P*(o5 — Gs). 


(Similar conclusions follow from the consideration of »,, of course.) The equa- 
tion (70) means the identifiability of «. 


It is interesting that the remaining components of y do not necessarily have 
to be identifiable. The next theorem shows that this is only true — at least 
in the bivariate LIFU- — for regression models in principle, i.e. if one of the 
variables is observed without errors. 


Theorem 3.1.2 Let 6 be identifiable. Then the other components of y=[«,B, 9,2] 
are identifiable if and only if the following conditions are fulfilled: 


1. Hither d; = 0, or €; = 0. 
2. Neither §; nor 4; have a distribution that is divisible by a normal distribution. 


3.1. Models with errors-in-variables 235 


Proof. Uf B is identifiable, so is x (Corollary 3.1.1). Then [9 | &'] is not identifiable 
if there exists a y + with y= P=, where « = &, 6 = f. This is equivalent 
to the validity of (71) and (67). Let us now insert (71) in (67). With this, the 
nonidentifiability of the remainder of y is equivalent to the existence of pairs 
[#1 X] + [91 X], which satisfy (71) and 


P:(8) = G(s) exp {—(05 — Gs) 8/2} (72) 


at the same time. (Divisibility of the distribution of a random quantity by 
that of u means that there exists a random quantity » with vy, = GyQy). 

Now the identifiability follows from the conditions 1 and 2; since, if (71) 
and (72) were satisfied at the same time, we would have @,(s) exp {—&5s?/2} 
= ¢;(s) because of o; = 0, and this would contradict condition 2. (The case 
o, = 0 can be treated with the equation for », corresponding to (72).) 

Conversely, suppose one of the conditions is not satisfied. If condition 1 is 
not fulfilled, we choose 6; with o; > 6; > 0 and o, — f?(o5 — 65) > 0. (72) 
and (71) are fulfilled then. If condition 2 is not satisfied, it holds e.g. for é; 
that 


P:(8) = Puls) exp (ims) exp (—os"/2). 


Then we choose 6; >0 sufficiently small so that o + (6; —6;) >0 and 
6. := 0, — B*(o5 — G3) > 0 holds. The function gy, determined according to 
(72), is a characteristic function and (71) is fulfilled at the same time. 


Theorems of this kind were also proved by Reiersol (1950a) for LIFU~ with 
independent error components [d;, €;] which do not necessarily have to be 
normally distributed. Some statements concerning this can also be found in 
Lukacz and Laha (1964, ch. 6.3). By means of general characterization theo- 
rems we obtain theorems for the multivariate case, too. A detailed analysis 
is given by the work of Rao (1966) and with other methods by Jeeves (1954). 

Of course, such statements can not be used for checking existing practical 
models with respect to their identifiability. Although the normal distribution 
provides a sufficient approximation for many practical purposes, it is always 
an approximation only. This holds because the supports of the distributions 
in practical cases are always finite. Thus the identifiability is theoretically 
secured in practice. But the principal importance of such identifiability theo- 
rems lies in the fact that we could theoretically expect an identifiability, 
but this is so ‘weak’ that the necessary sample size for practical identification 
can not be realized in practice. A way out is the replication of the experiments 
or the use of other additional information. 

Compared with this we always have the identifiability of « and y with non- 
random incidental parameters. From P*% = P*, we obtain 


fy = Lz = Lz = Mp. (73) 


236 Chapter 3. Models with errors-in-variables 


And y remains identifiable in the family P% if it was identifiable in Pi. Then 
}y = [My also yields 


High ese ed (74) 
and equivalently 
Pi = FF (75) 


ye 
hence 


Baltes oYbxc (76) 


There still remains the question of the identifiability of the structural para- 
meter, which does not always have to be secured, as the following simple 
example will show: Let us take the straight lines in IR? as structural bundle. 
Now, according to the previous considerations j,) is uniquely determined by 
P*, but this does not imply the unique determination of a corresponding straight 
line for » = 1. 

In this example two design points are already sufficient for the property 
that almost all experimental designs on the true structure are identifying for 
the straight-line parameter, as it is intuitively obvious. But for more general 
nonlinear models the problems related here demand more detailed investi- 
gations, which will be reported on briefly at the end of the section. For linear 
models we can prove the following identifiability theorem with elementary 
means. 


Theorem 3.1.3. For LIFU* with a linear regression part the linear subspace 
£ = KL) is identifiable iff r|W] = n. 


Proof. 7[UPy’1] = 1, holds for r[W] = n. Hence 
R(E(Z(L = Py'1))) = R(M,UPy 1) = R(M,) = £. 


Conversely let r[ W] < n. Then there exists a vector § € R" withE’W = 0€Myy. 
Now we choose a & € M?>%,),, with & as the first row. We get EW = [0, é], 
EE My—q—yxe Let L = (Ui £) € Meh, 1 € IR®. We choose L® = (i | L) 
with t, | A(L®) =: £, Then the relation 


HM) = £, + RM) = 4, 


holds for M® := L£, where in M{" is the block of M® = [M™ i mM] 
belonging to U in W = [U1 V]. But we have 


Mow = MOW = LE. 


Consequently P*: = P** does not yield £, = f, and thus £ = R(M,) is 
not identifiable. 


3.1. Models with errors-in-variables Dell 


It should be taken into account that this does not answer the question which 
and ‘how many’ experimental designs on the true subspace identify this sub- 
space. The theorem only states the following: in case an experimental design 
is contained in a subspace from £ d,-.> then we can truly identify this from the 
observations. However, for other asymptotic investigations we need a descrip- 
tion of such models where one fixed experimental design can not be contained 
in two different structures at the same time. Details my be found in Héschel 
(1978b, 1986). 


Definition 3.1.7 Let a fixed structure S, from the bundle Sy be given. An ex- 
perimental design vn) on Sx, is called identifying for S, — with respect to the 
bundle Sy — if vn) is not contained in any other structure from S,. 
(Of course we assume that exactly one parameter only corresponds to each 
structure.) 

In this case 


IU(Y(ny) = 8, = {7 | [2n)> ze] € S} 


is a 1-element set. In particular, this is the case if z(v) is empty. Then v does 
not lie on any state manifold from the model and consequently is not identi- 


fying. 


Definition 3.1.8 An explicit model with the system functions n; = rj(wi, §;, 2) 
for the observed states is called internally (structure) identifiable if, for all x € IT 
and almost all En), Win), the experimental designs Vn) = ([Wi, Mi])(n) Induced by 
means of the system function on the state manifold S,, are identifying. 


Theorem 3.1.4 (Identifiability theorem for nonlinear models) We obtain 


internal identifiability if: 


1. there ewist sufficiently many experimental design-points, namely n > dim IT; 
and 

2. the corresponding state manifold S, has no weak contact of infinite order for 
different structural parameters (cf. Héschel, 1986): 


We notice that most of the functions that are practically applied in curve- 
fitting satisfy this second assumption: polynomials, exponential and trigono- 
metric functions. For regression models we obtain the following result. 


Theorem 3.1.5 In a regression model almost all experimental designs win) 
are identifying tf: 


1. n > dim I; and et 
2. different state manifolds do not have a weak contact of infinite order: In this 


case the state manifolds are given by 


Si =a {Un t= ([wi, Mia sen 7 his r(Wi, Hae i= 1, sey n} : 


238 Chapter 3. Models with errors-in-variables 


An illustration results if we recall that 
Sty — 40 — [= Te) 


is the graph of the related regression function in the state space IR® for fixed z. 

To prove some fundamental properties of estimators in errors-in-variables 
models, especially in order to prove consistency, we need the just treated iden- 
tifiability not only for almost all but for all states. We can always achieve this 
by an inessential restriction of the parameter region if only the assumptions 
of the identifiability theorem are fulfilled. For this purpose we only have to 
omit for each z € JT the null set of those independent state vectors w;, &;, 
i =1,...,n, that provide nonidentifiability. Over this parameter region the 
identifiability results for all parameters from the observation distribution so 
that P&” = Pr yields yo = y,. Then we get pny = “nyo and y, = yo aS In 
(74), (76). Finally, since ;,)9 is among the identifying states on the state mani- 
fold s,,..,.1, because of the restriction of the parameter region, 7, = 1% also 
follows. 


3.1.5 On the existence of consistent estimators 
under a nonrandom experimental design 


In the last section it was argued that random design can cause nonidentifiability 
and inconveniently just for the normal distribution which has proved its prac- 
tical value in many other applications. On the other hand, for distributions 
of the design other than the normal distribution, the distribution of observa- 
tions can not be obtained easily. We do not have these disadvantages for 
nonrandom distributions of observations. They can easily be described and the 
parameters are practically always identifiable. This is the reason one would 
be tempted exclusively to use nonrandom designs for modelling. However, 
the ‘failure’ of the MLE, described in Section 3.1.4, indicates that we have to 
construct other estimators for certain models to obtain consistency. In this 
section it will be shown that for a large class of models with nonrandom design, 
consistent estimators cannot exist. Consistency is a property of convergence 
for infinite sequences of observations. With an increasing number of incidental 
parameters we also have to take into account infinite sequences of parameters. 
With a nonrandom experimental design the distribution of 2,,) is induced 
by a distribution supposition for ¢,.): 


Pico = Pk coy FF 0009 , Uy Ss UWicsys TT, y]. 


Here (4(..) satisfies the equations 0 = s;(w;, ;, 7). 
With random experimental design the distribution of u,..) is described by a 


foe} 
parameter y € 0: Ui.) ~ Po,,, where the structures x St;, have to be the sup- 
i=1 


3.1. Models with errors-in-variables 239 


ports of the distribution P, ,. Hence, for all i = 1, 2, ..., it has to hold almost 
surely that 0 = s;,(w;, 4;,). With explicit models this holds from the very 
beginning since the 4; may have an arbitrary distribution which is induced by 
a distribution of §;. In case wu; and ¢; are stochastically dependent, the distri- 
bution of the observations z,) is uniquely determined by the parameter 
y = [6, x, ]. 


Definition 3.1.9 In an errors-in-variables model an estimator # = (Zn) 
that is defined for all natural numbers n is called consistent (respectively strongly 
consistent) for x if 


n—>co 


Ce Ne (resp. 2 (%(n)) + zx) 
holds for each parameter yp = [M..) 2, y] € VY. 


(In general errors-in-variables models the estimator #% still depends on the 
exactly observed regressors wn). 

Now we call a model with random experimental design as belonging to a 
model with nonrandom experimental design if both models have the same 
state equation. Provided that in the consideration of a model with random 
experimental design yu») is a realization of u,), we can formally establish the 
related model with nonrandom experimental design. Then we can give a condi- 
tional estimator #(2,) | “n)) from the distribution of which we can obtain 
the unconditional estimation of the original model, at least in principle. For 
instance, let uw and ¢ be independent, then 


P= Safe Persie dP#(u) zsi( Pets dP*(u), 


where P“*+¢'# is the conditional distribution of 2 for fixed w. Such a proce- 
dure corresponds to a conditional inference in errors-in-variables models 
(Madansky, 1959, p. 175; Kendall, 1951, p. 17; Malinvaud, 1966, p. 378). 
The following result, which demonstrates the close connection between both 
models, is valid. 

Theorem 3.1.6 In a model with random experimental design let the structural 
parameter be nonidentifiable. Then it can not be estimated in the corresponding 
model with nonrandom experimental design. 

Proof. Suppose # is a consistent estimator in the model with nonrandom ex- 
perimental design. Then we have 


lim P iA | ny = Mays Scny = Sm} = 1 


with the set | 
n—oo 


y contains the parameter for the distributions of [4(..), $(~) and the structural 
parameter z. 


240 Chapter 3. Models with errors-in-variables 


By a general zero-one law (cf. Loéve, 1978, 32.4.A) 
P(A) = 1, 


hence # is consistent in the model with random experimental design. But this 
contradicts the assumed nonidentifiability, because for at least one pair 
x +: 7 it should hold that 


i 
Py = PF 
Because of the consistency it would follow that #74 2+%<ia. 


The theorem shows that, without additional assumptions on the sequence 
ico)» We can not obtain consistent estimators in general, because, without further 
assumptions on the sequences m,..) there could also arise realizations from a 
model with a nonidentifiable structural parameter. Nussbaum (1978a) descri- 
bed this connection for linear functional relations. 

We obtain identifiability, for example, if we carry out repeated observations 
of a fixed experimental design. Above all, such models occur in natural and 
industrial sciences. This holds if the design points lie on a straight line and 
\|4:]] < ||uisal|. This example is to be found in Ware (1972). But we can not 
consider each model with a fixed experimental design as belonging to one 
with a random design. There are sufficiently many examples in practice, in 
which the experimental design can not be considered as the realization of a 
random variable. An example would be the periodic observation of a planet 
after an exactly fixed time interval. Here the time error would be small in 
comparison witt all other errors in the model and therefore the observational 
time may be considered as nonrandom or deterministic. 


3.1.6 Bibliographic remarks 


The aim of this section is to contribute to the easier orientation among the 
great number of the results that are available at present and it will facilitate 
access to the literature. Furthermore, it motivates the representation of the 
following sections compared with other monographs and indicates problems 
in solving open questions. The representation is based on a bibliography, in 
which completeness was aimed at. Of course, the following selection is sub- 
jective. 

The basic classification is carried out according to nonrandom and random 
experimental design, first for statistical estimators and then for the other 
statistical inference methods. The first works on errors-in-variables models 
concerned the bivariate LIFUt-model: Adcock (1877, 1878), Kummel (1879), 
Pearson (1901). They treat the problem of the estimation of straight lines by 


3.1. Models with errors-in-variables 241 


minimization of the orthogonal and the weighted squared distances between 
observations and a straight line. This method of least squares, which provides 
MLE for normally distributed errors, has been developed to become the theo- 
retically and computationally most applied method. For some periods in the 
1940s and 1950s there dominated discussions about the previously mentioned 
fundamental questions of statistical inference in such models and most of all 
about simpler estimation methods. 

With the recent development of computer techniques, the computation of 
WLSE and related estimations has become possible also for nonlinear models, 
and the WLSE is correspondingly often applied and investigated. Uven (1930) 
considered LIFU* with normally distributed errors and a covariance known 
up to a factor. Koopmans (1932) proved this estimator to be an MLE, and he 
approximated the variance of the estimator for small error variances. For un- 
known covariance Dent (1935) gave the solution of the likelihood equations. 
Inndley (1947) showed that the stimators resulting from this fulfil an equation 
that is incompatible with consistency. This gave rise to detailed discussions 
about the application of MLE in bivariate LIFU+ with nonrandom design. 
Here the number of incidental parameters increases with the number of obser- 
vations. Following the fundamental works of Neyman and Scott (1948), Kiefer 
and Wolfowitz (1953) were able to show that, under certain assumptions on the 
distribution of the ~,) in the bivariate LIFU-, the likelihood method also 
provides consistency in the related LIFU*. Although the practical value of 
this method is very restricted, it showed the principal possibilities of the con- 
sistent estimability in LIFU*: But it turns out that the maximum likelihood 
solutions under unknown error covariance 2 indeed yield only a saddle-point 
of the unbounded likelihood function. Anderson and Rubin (1956) proved the 
unboundedness for the model of linear factor analysis, which only represents 
a special LIFU. 

In the literature Solari (1969) has mostly been cited; she rediscovered this 
result and proved the MLS to be a saddle-point. Sprent (1970) derived some 
conclusions for the practical application of this method from this. Copas (1972) 
showed that the consideration of rounding errors in a certain form provides 
again a global maximum. Sprent (1976) discussed more general questions which 
arise from the likelihood method for LIFU*. But even for known error variance 
it has not been possible to prove the MLS as local extremum (Moberg and Sund- 
berg, 1978). The attempt to analyse the likelihood situation in more general 
LIEU is to be found in Florens, Mouchart, and Richard (1976) and Willassen 
(1979): 

More and more there are also treated multivariate LIFU, mainly LIFU*, 
with normally distributed errors. But the difficulties that already became 
apparent when applying the MLE in bivariate models, and even more in 
multivariate ones, demanded the assumption of ‘additional information 
about the sequence of the experimental designs or about the form of the error 


distribution §(n): 


16 Nonlinear Regression 


242 Chapter 3. Models with errors-in-variables 


Naturally, the cases of repeated independent observations of a fixed experi- 
mental design play a special role here. For known error covariance Geary 
(1948) gave the MLE as eigenvectors of a certain eigenvalue problem, which 
previously had been found heuristically by Tintner (1945). For uncorrelated 
normally distributed errors ¢),...,$, with unknown covariance Anderson 
(1951a) gave the MLE for LIFU* with linear regression part. Because his 
paper has been titled ‘estimation of linear restrictions for regression coefficients’ 
(cf. (38), (39)) these results were overlooked for a long time in the works on 
LIFU. Various partial results in special cases were thus rediscovered (Acton, 
1959; Villegas, 1961; Nussbaum, 1976). The inconsistency of the MLE covariance 
estimations can be found in Villegas (1961) and Gleser and Watson (1973). 

Hannan (1967) provided a complete consideration of the relations between 
linear simultaneous equations and canonical correlations. A unified derivation 
of these results with fixed covariance and a covariance to be estimated is 
finally possible on this basis. Then Robinson (1973, 1974) gave a unified deri- 
vation of the WLSE and MLE for a general time series model comprising the 
mentioned models. 

For bivariate LIFU with normally distributed incidental parameters, the 
MLE was given for various models in a comparatively closed form (Cox, 1976; 
Dolby, 1976a; Brown, 1978a; Chan and Mak, 1979a, b). One of the needed 
minimization results is contained in Anderson (1951a) as a special case. Based 
on GLSE which are MLE under normal error distribution, we will derive asym- 
ptotic properties, especially consistency and normality. Already Villegas (1966) 
was able toprove asymtotic optimalitiy properties of the GLSE. In Barnett (1970) 
a formula for the asymptotic variance in case of replications is to be found, which 
was corrected later by Patefield (1977). For the limited- information MLE (LIML) 
in linear simultaneous equations, which corresponds to the MLE in LIFU* with 
replications, Anderson and Rubin (1950) showed the asymptotic normality. 

But such properties were also shown for the case that there are no replications, 
in Schneeweiss (1976) for a corrected OLSE under suppositions on higher-order 
moments. For a large class of instrumental variables estimators, Nussbaum 
(1977, 1978) derived the asymptotic normality and proved the optimality 
of the GLSE. In the work of Robinson (1974), mentioned above, we can also 
find the idea of proof for asymptotic normality. 

A survey and detailed discussions of further asymptotic results can be found 
in Anderson (1976). There the distribution of the WLSE is approximated, if 
\l4c|? becomes large and ~ remains fixed. Thereby quite a number of known 
results on linear simultaneous equations could be transformed. Patefield (1976) 
compared the results in a Monte Carlo study. Based on the reduced form of 
linear simultaneous equations (cf. (38) and (41)), all results can be transferred 
mutually. The limited-information MLE in linear simultaneous equations 
corresponds to the MLE in LIFU*. For known error covariance, asymptotic 
expansions of the density of LIML are derived by Mariano (1969). 

For the two-stage LSE the densities are developed from Basman (1961, 


f 


& 

1963), Richardson (1968), Sawa (1969), Sargan and Mikhail (1971). These den- 
sities also resulted in the form of double-infinite series of incomplete beta- 
functions. The distribution of the two-stage LSE results as a double series of 
noncentral F-distributions (Anderson and Sawa, 1973). With estimated co- 
variance the density of the LIML was determined by Mariano and Sawa (1972). 
From this it resulted especially that also in LIFU for MLE there do not exist 
higher-order moments. This fundamental result stimulated modifications of 
the MLE for which higher-order moments exist. Fuller (1977) showed that these 
modifications, to the order O(n-?), are better than a corresponding modification 
of the ‘k-class’ estimators. These estimators contain the two-stage LSE and 
the OLSE, and were originally introduced by Theil (1958). Nagar (1959) gave 
approximations for the moments of the approximating distributions. Fuller 
(1977) also computed the optimal modification parameter and was thus able 
to explain the relatively bad power of the MLE compared with the two-stage 
LSE known from Monte Carlo studies. The approximations obtained in this 
way are valid for increasing sample size. Other approximations are obtained 
for small error variance (Kadane, 1970, 1971). The relation to those given above 
for increasing sample size was shown by Anderson (1977), who also referred 
to the fact (see also Brown, Kadane, and Ramage, 1974) that in Nagar (1959) 
and Kadane (1970, 1971) the results were erroneously interpreted as approxi- 
mate moments of the distribution. In this sense Robertson (1974) and Williams 
(1973) indeed only gave the moments of the approximating distributions in 
LIFU*. The asymptotic normality for the modified estimators is treated in 
detail by Fuller (1977). For a more general linear simultaneous equation model 
with instrumental variables, Robinson (1974) gave an elegant derivation of the 
MLE and the asymptotic normality. The proof provides a unified approach 
to many models. Some algorithms for WLSE in LIFU are investigated in 
York (1967), Spathe (1967), and Williamson (1968). 

On another application of the least-squares principle in LIFU* there was 
published a much-discussed work by Sprent (1966). By heuristical considera- 
tions for LIFU* a kind of ‘GLSE’ was constructed here if ¢,,) has a general 
but known covariance 2 € M>,,. Dolby (1972) showed the equivalence of these 
‘GLSE’ with MLE (= WLSE) for normal errors as a special case in nonlinear 
functional relations. Another approach to the equivalence of WLSE and MLE 
in the normal case is investigated in Héschel (1978a). The equivariance is 
also shown there, and problems of identifiability are considered. 

Since 1960 nonlinear models and WLSE have been investigated more inten- 
sively. But the first general algorithms already appeared in Deming (1931, 
1943) and Oook (1931). The ideas for constructing WLSE by linearization of 
3(a,-) over the last iteration, described there, have remained of interest till 
today. But first, from 1960 on, the WLSE algorithms were developed for special 
nonlinear models: in Hey and Hey (1960) for hyperbolas, in Robinson (1961) and 
Chan (1965) for spheres. Clutton-Brock (1967) considered bivariate models and 
Griliches and Ringstad (1970) investigated the bias of the OLSE in quadratic 


16* 


244 Chapter 3. Models with errors-in-variables 


models, O’ Neill, Sinclair, and Smith (1969) treated general polynomial models. 
For general explicit functional relations Dolby (1972) gave the WLSE equations 
for known covariance of the ¢;, and Britt and Luecke (1973) determined them 
for general implicit models of the form (7) with known covariance § = §(n)- 
Later on Dolby (1976b) showed the equivalence of both methods, namely that 
of the ‘scoring’ method for explicit models and of the Lagrange multiplicators 
for implicit ones. 

For unknown error covariances Dolby and Lipton (1972) investigated for 
bivariate functional relations the related Newton-Raphson algorithm. The 
asymptotic variance of the estimator for the case of replications is also to 
be found there: With a slightly modified dependence structure of the error 
covariance D(6,)) = A © 4, A known, this was further developed for multi- 
variate models by Dolby and Freeman (1975). Besides these algorithms there 
could also be shown statistical properties of the WLSE for nonlinear models. 
Villegas (1969) gave the asymptotic variance of such an estimator in the case 
of replications and he showed the asymptotic optimality in a certain class. 
Fuller and Wolter (1982) demonstrated under weaker assumptions the asympto- 
tic normality of a WLSE for univariate explicit models and gave the asympto- 
tic variances. This is also valid for a special estimation in quadratic functional 
relations (Fuller and Wolter, 1977). The sequence yw; of the incidental para- 
meters has to satisfy certain weak assumptions. Modern numerical procedures 
for the computation of WLSE may be found in Southwell (1976), MacDonald 
and Powell (1972). Globally convergent algorithms on the basis of regularized 
Gauss-Newton procedures have been developed by Héschel and Penev (1980) 
and Tiller (1983). 

From this survey we can see the wide applicability of WLSE and related 
estimators. Their efficiency is mainly based on the fact that they is obtained 
by minimization of certain quadratic functionals. But there are quite a number 
of further estimation methods for functional relations which use principles 
other than the minimization of a quadratic functional. These methods were 
investigated mainly in the 1950s. One of the most popular estimation methods 
for bivariate LIFU* was the grouping method, where the straight line 
runs through the means of two groups of observations. A work by Wald (1940) 
made clear the possibility of the consistent estimation under certain assump- 
tions to the realizations m,,) if all observations are used. Compared to this, 
Bartlett (1949) omitted a middle group of observations. Investigations of 
the corresponding method in LIFU- followed (see below). Asymptotic variances 
for such estimations can be found in Dorff and Gurland (1961b). Furthermore, 
they compared their properties with the OLSE for small sample size on the 
basis of approximations of the bias and of the mean square error. These appro- 
ximations contained results of Brennan and Housner (1948). 

However, such grouping estimations can be understood as special instru- 
mental variables estimators (IVE). Such estimators were first studied by Reiersal 
(1945), namely for linear simultaneous equations. Richardson and Wu (1970) 


3.1. Models with errors-in-variables 245 


compared OLSE and other IVE on the basis of the variance. Sargan (1958, 
1959) showed the relation between the two-stage LSE, which originates from 
the two-stage estimation in simultaneous equations, and the IVE. Guarian 
and Halperin (1971) provided the exact expressions for the variances for more 
general cases by means of applying hypergeometric functions. A combination 
of IVE and OLSE was investigated by Feldstein (1974). Thereby, greater losses 
of efficiency, which are characteristic of IVE, as well as the bias of the OLSE, 
are mutually compensated, whereby approximations and Monte Carlo simu- 
lations are used to prove this fact. Investigations of efficiency for ranks as 
instrumental variables, a method that goes back to Theil (1950a, b), are to 
be found in Ware (1972). These results occur again in the general approach to 
the asymptotic optimality of IVE in Nussbawm (1978). Further methods for 
LIFU- are obtained in the case of replications, but also with groupings and 
other instrumental variables by using variance components according to 
Tukey (1951). There is a good representation in Madansky (1959, pp. 189 —194). 

Another method for LIFU*, which also allows estimations in the nonnormal 
case, results when using cumulants (Geary, 1942, 1943). Principally this method 
can also be applied in LIFU~ and for nonlinear models, but it has only been 
of theoretical interest till now. Kendall and Stuart (1961) offer a good summary. 
Newer methods of estimation in LIFU* mainly aim at reducing the variance 
of the estimators, if necessary at the expense of the unbiasedness. To this 
group we can assign the works by Lord (1960) and DeGracie and Fuller (1972). 

The difficulties in identifying and handling the distribution of observations 
are clearly reflected in the works on estimations with LIFU-. At the beginning, 
bivariate LIFU- were considered, and MLE of the different parameters, with 
additional information about the error covariance, were repeatedly published 
by various authors independently of each other (e.g. Kummel, 1879; Pearson, 
1901; Lindley, 1947) (cf. Madansky, 1959). Geary (1949) was the first to show 
the identifiability in the nonnormal case by means of his method of cumulants 
(cf. Scott, 1950; Drion, 1951). Before this, the nonidentifiability in special 
LIFU- had already been proved by Thomson (1919), Gini (1921), Frisch (1934), 
Neyman (1937), Koopmans (1932). Reiersol (1950a) gave the necessary and 
sufficient conditions. A detailed analysis of the multivariate case was worked 
out by Rao (1966); see also Jeeves (1954). 

The grouping method by Wald (1940) for LIFU* was modified for LIFU-, 
and comparisons of efficiency were carried out for special error distributions: 
Nair and Srivastava (1942), Bartlett (1949), Teil and van Yzeren (1956), Nawr 
and Banerjee (1943), Gibson and Jowett (1957). Dorf and Gurland (1961a, b) 
calculated the estimator variances for small and large samples. The identifia- 
bility condition for this method was given by Neyman and Scott (1951). 

Other estimators constructed by Neyman (1951) and Wolfowitz (1952) 
remained of only theoretical interest because they are difficult to calculate. 
Spiegelman (1979) offered an estimator that can be calculated more easily. 

Some discussions were evoked by the situation of the ‘overidentification’ 


246 Chapter 3. Models with errors-in-variables 


in the normal case with known error variances, which arose by wrong appli- 
cation of the MLE, (cf. Kendall and Stuart, 1961, 29.11). Kiefer (1964), Barnett 
(1967), and Birch (1964) showed that the correct application of the MLE pro- 
vides correct results. 

The application of instrumental variables was investigated by Reversal 
(1945). Geary (1949) studied the efficiency. An example from biology is con- 
sidered in Carlson, Sabel, and Watson (1966). Lyttkens (1977) gives a survey. 
A special multivariate LIFU- is investigated in Barnett (1969), where 
the starting point was the calibration of measuring instruments in medi- 
cine. 

Estimations with repeated measurements by means of sample covariances 
are contained in Tukey (1951); Madansky (1959) gives a good survey. Asym- 
ptotic variances are given in Dorf and Gurland (1961a) and a more complica- 
ted estimator was constructed by Housner and Brennan (1948). 

For LIFU with dependent observations, Robinson (1977) constructed a 
contrast function, which provides consistent estimators. The related mini- 
mization problem is similar to WLSE problems. With a new approach, which 
uses martingale theory, consistency, and asymptotic normality, the correspond- 
ing variances are derived and algorithms are given. This model comprises 
LIFU* as well as LIFU-. 

Problems of Bayes’ inference were investigated by Hl Sayad and Lindley 
(1968), Zellner (1971), Villegas (1972), Florens, Mouchart, and Richard (1974), 
where the basis was certain noninformative prior distributions. Some tests were 
constructed by Anderson (1951a) in his fundamental work on LIFU* with 
linear regression part (cf. Section 3.1.3). Independently, Williams (1955), 
Bartlett (1957), Basu (1969), and Moran (1956) considered some special prob- 
lems, e.g. for known error covariance. Confidence intervals may also be found 
in Anderson (1951a). Later on, special cases were considered once again in 
Creasy (1956), Brown and Fereday (1958), Halperin (1964) and Villegas (1964), 
among them also some for known error covariances. For LIFU~ according 
to (7), asymptotic tests and confidence intervals were developed in Cox 
(1976): 

Finally we want to mention some results on special models that are closely 
connected to functional relations. Controlled variables were considered in 
Berkson (1950) for LIFU and in Fedorov (1974) for nonlinear models. Lindley 
(1947) studied the problem of the relation between the conditional expecta- 
tion of y under & and the corresponding LIFU for é and 7. There resulted 
conditions under which a LIFU between the é; and 7; provides a linear re- 
gression of y; over aj. 

Mainly starting from econometric models, models with dependent errors 
and time-series models are considered. Concerning this, there are results in 
Robinson (1974), and for time series in Nowak (1975, 1976, 1977), Aigner (1966). 
The problems connected with such models are only partly touched upon in the 
present chapter. 


3.2. Maximum likelihood estimators 247 


3.2 Maximum likelihood estimators 


The survey about the literature illustrated the central role of the MLE and the 
associated WLSE. The following section provides a survey. First, in Section 
3.2.1, bivariate linear functional relations are treated in detail, where in the 
initial model the experimental design is normally distributed. From this model 
we obtain on a unified basis a number of interesting special models and the 
corresponding estimation formulas, among them those for models that are 
known as ‘structural’ and functional relations in the literature. For such simple 
models there result estimation formulas that can still be handled on a small 
computer. This does not hold for multivariate models with random design so 
that in the multivariate case only models with nonrandom experimental 
design will be considered. 

In Section 3.2.2 we describe the relation between MLE and WLSE for 
general errors-in-variables models with nonrandom experimental design. 
Based on this, the MLE is comprehensively derived for multivariate LIFUt 
with known error covariance in Section 3.2.3. The equivariance is also shown. 
In Sections 3.2.4 and 3.2.5 two important LIFU* with unknown covariance 
are investigated. In these cases we still can obtain the MLE in a relatively 
closed form, as the solution of an eigenvalue problem. 

For general models we can no longer give closed formulas. Sections 3.2.6 
and 3.2.7 describe some possibilities for simplifying numerical calculations in 
these special models. Finally, we briefly mention identifiability properties of 
estimators. 


3.2.1 Bivariate linear functional relations 
3.2.1.1 The general model 
We consider the LIFU 

Ny =a + BE, and lata Or 

ty = Fy + Oy sR Wes 

Yig = Nj Taj 

§i; © {(N(H;, o¢) | Fi € RY, a; = O}, 

Si; © {N(0, 95) | o5 2 O}, 

&; © {N(0, o,) | o, 2 9}, 


which was described in Section 3.1, where the random variables are all sup- 
posed to be independent of each other. Inserting known parameters yields 


248 Chapter 3. Models with errors-in-variables 


various important models as special cases. A model not included here, which 
is nevertheless important, can be found at the end of Section 3.2.1.2. 

The uniform treatment of all models is based on the following principle. 
We insert the known parameter values or the corresponding functions of them 
into the likelihood function for the general model, e.g. o; = 0 or o; = @105, in 
case 0, is known. Then we obtain the MLE with the known parameter values 
as the maximum of the thus simplified likelihood function. The same holds for 
the related likelihood equations, where the roots of the gradient of the likeli- 
hood function are determined. These roots are the stationary points of the 
likelihood functional, one of which is the MLE. The equations for the speciali- 
zed models also result from the equations of the general one by inserting the 
known parameter values. 

Now we describe the likelihood function. With fixed 6 € IR1 and y := [o,, 
6,, 6:] = 0, 2; := [%;;, yi;] has the covariance 


Di) = is < Si me ‘e ol =: 2(6, 7) (2) 
Further, let 

i = [8;, « + Bi], (3) 
and for X € MF let 

La, B, Bn), ~) 

:= —(m/2) log det [2] — pe ~ (zi; — mi)’ 2 (zy — mid/2 (4) 
The log-likelihood function is then 

L(y) = L(o, B, An 2(B, 7) (5) 
with the parameter 

v == [x B Hm, 7) € yc IR" (6) 


where y is the domain in IR"*® for which y lies in 
(R+)? =P := {y | o; = 0, 0, = 0,0, = 0} (7) 


and 28, y) is positive definite. 
More suitable for various purposes is an equivalent representation of 1,: 


1, = —(m/2) log det [2] — (1/2) tr 2-18, : (8) 


— BD md ) Gi Fa - (9) 
G4) 


3.2. Maximum likelihood estimators 249 


A further representation results from splitting of S into a summand which is 
independent of y, and one that only depends on the observations ij OVEL Z;: 


S = Wz + S(Hn) (10) 
Ss (ty — 21.) Gy = 21.) (11) 
0) 

Sm) = dmilzi, Mi) (Zi, — mi)’ macl2) 


3.2.1.2 Replicated observations 


Now let at least one of the m; be greater than one. Thus there exists at least 
one repeated observation. Then the likelihood function is bounded. To show 
this, we notice first 


maxl,< max max {—(m/2) log det [2] — (1/2) tr [2-18]}} (13) 
y ZEMF BAM 


All items of S are at least positive definite. Let m;, > 1; then 


1, S max {—(m/2) log det [2] — (1/2) tr [2-48;,}} (14) 
z 
with 
Si = 3 ut — Fi.) iy — Fi) (15) 
j= 


But, since the z2;; are normally distributed, S;, is almost surely positive 
definite, as is obvious from [A 3.14]. Hence the right-hand side is bounded and 
attains its maximum for (cf. [A 3.15]) 


5 = S;,/m. (16) 


Thus the likelihood function Z,) is almost surely bounded. As we will see, 
the MLE also exists almost surely and can be obtained in several steps. First 
the parameter region is extended and the maximum of 1, determined there: 


max max1,(a, B, Bn), 2). (17) 
#.B,Pmy) EMF 


Let &, B, bn); s be the solution. Let # be fixed. Then S uniquely determines a 
y(2') by the relation 


zp == (7 3 (18) 


248 Chapter 3. Models with errors; variables 

$< es 3 

various important models as,’ gystem of linear equations for the unknowns 

is nevertheless important. 

The uniform treatm~ 

We insert the know; 

into the likelihoe™ 

case Q; is knor%2> 

as the may’ 

the rele’ * + Bio; = bs. 

hoog ) € (IR>)3, then we have a solution of the initial problem. It will be shown 

Dat the maximization problem with the extended parameter region has almost 
surely a uniquely determined solution 8 = f,, and moreover only one further 
stationary point f.. This solution was given by Cox (1976), who omitted the 
elementary but complicated calculations to solve the likelihood equations. 
We will obtain Coz’s solution from Theorem 3.2.9 of (Anderson, 1951), which 
contains the multivariate form of the problem. Hence f = f, is determined 
as a solution of the eigenvalue problem 


(19) 


(Bz —4,Wz) [Be -1]=9, O<A<h. (20) 


Equivalently we have 


(1, 8) Wz Bz[8, —1] = 0. (21) 
Then we can use 
W;' = det {Wz}? ( Yy oe : (22) 
Wry Wy 


We obtain the quadratic equation given by Cox for 6, where on account of the 
condition A, < A, — resulting from Theorem 3.2.8 — the smaller one of the 
two solutions has to be chosen. This was pointed out by Theobald (cf. Cox 
and Dolby, 1977). 

In case the y(2’) defined from this solution is not in the admissible parameter 
region, the maximum is allocated on the boundary. This follows from the 
above-mentioned property of the extended likelihood function that there 
exist only two stationary points of 1,, one of which is the global maximum and 
the other gives the global minimum. If the maximum for the restricted likeli- 
hood did not lie on the boundary of (IR=)? then there would have to exist still 
another local maximum of 1, with $3; in the interior of (R=)’, the l,-value of 
which exceeds all 1,-values on the boundary. The corresponding value of {3 
would give a third stationary point of the extended likelihood, which is im- 
possible. 

Comparing the maxima over the three boundary components of I” provides 
the desired MLE. In a model with equal numbers of replications, Dolby (1976 a) 
got the corresponding equations, but he did not investigate the MLE on the 
boundary. The boundary maxima of the likelihood function for o5 = 0 and 


3.2. Maximum likelihood estimators 251 


o, = 0, respectively, are simply obtained from weighted LSE in the correspond- 
ing bivariate linear regression models of y over x, and a over y respectively. 
The boundary MLE for o; = 0 finally requires the solution of an equation of 
the fourth degree in f (cf. Cox, 1976). The MLE for LIFU* with replications 
are collected in Table 3.2.1, where we have the following notation: 


B(B) = by — 2Bbry + Bbz,, 
W(B) ee Uy a 2BWay as Bw, , (23) 
T(B) = BiB) + W(8), 


oe Wye. — Waby — ((w,b, — wybz)? — A(Wyyb, — Wybry) (Wybay — Wzyby)) 
2(Wayb, — WzDzy) d 
(24) 
Pe (Bw, e Wry)|W(B), ; (25) 
q = (wy — Bwry)/W(B). (26) 
Table 3.2.1. MLE for LIFU*+ with replications 
para- inner MLE o, = 90 6, =0 o: =0 
meter 
B B BxylOz by|Oxy B 
8 WG. —9..+ BB.) +9%), Fi, %,.+(beylb2) ¥i.—9..) 9; 
Oé (Way —pqB(B) \/B Wy wybiy/b5 0 
05 pT (B)/B 0 BE Org Oy (Wyt(8b,—B gy?)/B( ) 
O; qT(B) b= 02 ,/Oe 0 Wy +(by—Bbxy)/B(B) 
For o ;= 0 we obtain f as the solution of the equation 
—y W, (Bb, Tc) bry) | BiB) = (by rT Bb,,) (Bbzy) (by icc pb,). (27) 
In this case, 
1. = —m(1 + log 2x) —— | Wy + Bho aa 
ie E ‘ BO) 


(b, ae, Boy)? 28 
«fon + Bp). }) oe 


252 Chapter 3. Models with errors-in-variables 


For o; = 0 and o, = 0, we have 


ue 

Ll; = —m(1 + log 2x) — ce log (». ( y— ald (29) 
2m b, 
1 b2 

1, = —m(1 + log 2x) — — log |w, |b, — —*)). (30) 
2m by 


Finally, let 


Hence, if the MLS is negative for one of the variances o, we have to choose 
that boundary MLE for which the corresponding likelihood value is maximal. 
From these equations we obtain on a unique basis, by further restrictions on 
the parameters, ME for special LIFU and statements connected with this 
that have been derived in different ways before. 


1. o, = 0 provides the LIFU* with repeated observations of an experimental 
design. Here one obtains 


é; = S3/m, a, = S;/m. (32) 


Barnett (1970) obtained an implicit system of equations without giving 
explicit formulae. 

2. For #; = & one obtains a LIFU- model known as a ‘structural relation’. 
One obtains (cf. Dolby, 1976a, p. 43): 


b= &., 
a+ pb =%., 


(In the classical approach (Kendall and Stuart, 1961, p. 379) this equation 
is obtained by a heuristically founded ‘sufficiency argument‘, which fails 
for known o;, o,; see Remark 3.2.2). 


(33) 


Chan and Mak (1979b) treat another interesting model with a different 
structure of replications: 


hi = & + Bes, §: ~ N(4, a2), 
ay =&+6;, by ~N(0,0,), 
Yi; = Ni Bays €i; ~ N(0, o,), 


where 7 = 1, ..., mp and the random variables §;, d;;, €;; are independent. But 
take into account that this model.can not be obtained as a special case of the 
previous one. This is immediately obvious if we calculate the covariance 
D(x;;, ei) for k == 7 in both models. In the starting model we have 0, whereas 


3.2. Maximum likelihood estimators 253 
<a SS ie re nll ahaha ica pd ih lee etd 
o; occurs in the one mentioned last. The MLE satisfy the equations (33) that 
were already obtained in the above model for the case of 0; = 3. The MLE 
for B is one of the roots of a polynomial of fourth degree. The boundary MLE 
are not studied in Chan and Mak (1979b). 


3.2.1.3 Observations without replications 


If there are no replications, then m; = 1, 1 = 1,...,n, 24; = 2; In this case 
the solutions of likelihood equations were discussed by Dolby (1976a), where the 
details of the following discussion are to be found. Without further additional 
assumptions it results from the likelihood equations that 


0 = (6, + 6°65)/det [Ay]. (34) 


This equation has no solution for real 6 (see Table 3.2.2). If the variance ratio 
o;/65 =: 0, is assumed to be known, an equation of the form 0 = 9g(f, o;, o,) 
follows which in general can not be satisfied by consistent estimators f, o:, o,. 
From this, for LIFU* i.e. for 9, = 0, the equation B20, = o, results, which was 
derived already by Lindley (1947). The unboundedness of the likelihood func- 


Table 3.2.2. MLE for bivariate normal LIFUt 


Ej, ~~ is Or)s 03; ~~ N(0, 5), &;; ~~ N(0, 6), 01 = 6:/05, 02 = G,/c5 


with replications without 
unequal Table 3.2.1 without additional “1? MLE* 
number of information 
replications 0; OF g, known — MLE 
Q, and e, known (35), (37) 
0, — 0,0, +0 (38) 
Cio; 
(33) O,, 63 known (39) 
01 known (40) 
LIFUt (o; = 0, = 0) 
equal (32) without additional “7 MLE 
number of information 
F plications o, or o, known “7 MLE 
0, known (37) 
o; and o, known, B from (39) 


Spee ee eee) ess oe eer 


* — MLE: there exists no global maximum of the likelihood function 


254 _ Chapter 3. Models with errors-in-variables 


tion for these LIFU* (Anderson and Rubin, 1956, pp. 129—130) corresponds to 
the inconsistency of MLE shown by Lindley. In fact, the likelihood equations 
determine only a saddle-point (Solari, 1969). 

It is also insufficient to know the ratio e, = o,/05 since in this case equation 
(34) follows in the same way. But if both ratios are known, the following likeli- 
hood equations are obtained. They are also valid for LIFU*, i.e. in the case 
0, = 0 and when all variances are known. We denote 


.t = I/(o, + B?o5), 
g(B) = TryB? + (02d, — dy) B — O24 ny, (35) 
e=y—a-l, — Bz. 
Then it holds that 


68=a-+ Bé,te, (36) 
O = osBlle.|!? (02 + B?) + 2n(01(e2 + 82) + @2) 9(B)- 


Under 0, = 0, for § one obtains the equation 0 = g(f) which has been known 
for a long time. Madansky (1959) showed that the greater of the two solutions 
provides the MLE: 


B = ((d, or 02d ,) sil ((d, ee. 0od,)" 3 4oodzy)"!”) [2d ry. (37) 


In case of only one known error variance under LIFU* the likelihood function 
is unbounded, except for the regression cases o, = 0 or os = 0 (Moberg and 
Sundberg, 1978). For e. = 0, e, += 0 — which might be considered as a ‘quasi- 
regression model’ — one obtains (cf. Dolby 1976a): 


bd). | (38) 


This is the geometric mean of both regression estimates of x over y and y 
over x, respectively, which was heuristically derived by Teissier (1948). 


Case 0; == 0.4. = 1S. et 


If both error variances are known and #; = #, then, according to Barnett 
(1967) one obtains the MLE from the equations 


B == ((d, oa: 02d ,) =e ((dy ae Ood,)* te 4osd.zy)"?) [2dy, 


=%, &=J, —6z,, (39) 


> 


6: = (01d, + 20,04, + B2dy) — 0,(6205 + o,)/(0, + B2)?. 


3.2. Maximum likelihood estimators 255 


But in Barnett no notice was taken of the possibility 6, < 0. In this case the 
MLE lies on the boundary ‘o; = 0’. This corresponds to 0, = 0 and the MLE 
is obtained as in the model in which the 9; can be different. But, finally, the 
same formula (37) results for 6. With regard to this model there were consider- 
able discussions on the ‘overidentified’ likelihood equations (cf. Kendall and 
Stuart, 1961, 29.9—11). They argued that the MLE could be obtained by 
‘solving those equations which equate sample values with the theoretical ex- 
pectations’. This would be based on the sufficiency of sample values for para- 
meters. But, in the case that both error variances are known, five equations 
are obtained for only four parameters, and thus there is an ‘overidentifiability’. 
Kiefer (1964) referred to the fact that the correct application of the likelihood 
principle does not cause overidentifiability. 

Birch (1964) also started from overidentified equations. He used the right 
likelihood principle only in those cases where these equations did not have a 
solution. In some of his complicated derivations he found the same solution 
as Barnett (1967) but, apparently, he did not correctly treat the case 6;= 0 
(cf. Birch, 1964, situation (vi), p. 1176). 

If 9, is known, one obtains the same formula for f. The solution for 6; is: 


(cf. Madansky, 1959). 

The formula for B remains for LIFU*, i.e. for 9, = 0. If all variances are un- 
known and « is known, the MLE was derived by Chan and Mak (1979a). 
Similar to the model with replications, here the MLE is obtained by investigat- 
ing the stationary points and the maxima on the boundaries of the parameter 
region o; = 0, 65 = 0, and o, = 0. 


3.2.2 Maximum likelihood and least squares estimators 
3.2.2:1 Estimation procedures for models with errors-in-variables 


In Section 3.2.1 the MLE were investigated for the most simple bivariate linear 
models. In important special cases explicit formula were obtained to estimate 
the structural parameter. For more general models this can not be expected. 
In this section the two most important estimation procedures are to be des- 
cribed for models with errors-in-variables. Both result from the minimization 
of an estimation functional over the parameter space 


(z) = arg min 1,(y) (41) 


pe! 


For more general models with errors-in-variables we had y = [mn), 2, y] 
€ S, < I’. For explicit models we have y = [&(n), %, 7]. 


256 Chapter 3. Models with errors-in-variables 


3.2.2.2 Maximum likelihood estimation 


Let the densities (or their logarithm) of the distribution of $:m) be given by 
i(-), y € I’. Then the MLE for y is obtained from the estimation functional 


L(y) = —E(z — yw) (42) 


Under normal distribution with unknown covariance 2 = Q(y), y € I’, we 
have 


L(y) = log det [Q]/2 + lz — wIB4/2. (43) 
The MLE result from 


}? =arg min minl,(y). (44) 


[4,.2]eS, ver 


3.2.2.3 Least squares estimation 


With the Euclidian norm |]-|| = ||-||; the method consists in firstly defining the 
sum of the squared distances between the observations z; and a structure St, 
fixed for the present. After that we have to search for a structure from the 
model which minimizes this distance, with respect to the state variables 
observed with errors. Let # = [n), 2]. Then the LSE is obtained from 


6 = arg min [len — Mn l?,_——- (45) 


[u,7]ES, 


where w = wn) is the known vector of regressors. This minimization can also 
be written as 


$= argmin min |e — pl: (46) 
’ TEM BG) ESwx 

As is well known from regression analysis, quadratic distances different from 
the Euclidean are sometimes more suitable. Let W € Itz, be a weighting 
matrix and ||-||,, the corresponding quadratic norm. This yields weighted LSE 
(WLSE) with weighting matrix W. If W~! is the true covariance of [(,), ge- 
neralized LSE (GLSE) are obtained. As will be seen, the LSE for the regression 
models are included in a natural way (cf. equations (3.1.50) —(3.1.53)). Ob- 
viously, the following statement holds. 


Theorem 3.2.1 For normally distributed errors (n) with known covariance the 
GLSE is MLE. 


Remark 3.2.1 For LIFU*, Sprent (1966) introduced another method also 
called generalized LSE. With the residual variable é = y,, — (I, © B) He 


3.2. Maximum likelihood estimators 257 


the following ‘Least squares criterion’ can be introduced heuristically : 


k,(a) = min é'(H, 86’) é. (47) 
nell 

Dolby (1972) showed equivalence of this estimate with the WLSE for known 
covariance under normal error distribution. The graphic interpretation of 
WLSE is possible for the case Q = I, © I, (cf. Figure 3.2.1). Hereby St. 
has to be exactly that structure from the etedel St,, which minimizes the sum 
of squares of the vertical distances from the observed points z; to St». The 
WLSE with this special weighting matrix is called orthogonal LSE (ORLSE). 
For bivariate LIFU* the WLSE lies between both lines of regression (cf. 
Figure 3.2.2) for every nonsingular weighting matrix. 


rt 23 


a Ste 


Z2 Fig. 3.2.1. 


& Fig. 3.2.2. 


The regression lines are the WLSE for the o, = 0, which leads to G,),, and 
for o, = 0. For these lines, the squares of orthogonal distances are no longer 
minimized, but rather the sum of the squares of distances in the direction of 
the 7- and &-axes, respectively. The orthogonal LSE lies between both lines 
of regression which is to be seen from the computation formula. We have (cf. 
(3.1.37)), 


p =((d, —d,) + Vc, — 4)? + 4d2,)) 2d ry 


Byjz = mlGes Pole = dy|dry- 


Using the inequality dz, < d,d, (Cauchy’s inequality) implies that the numera- 
tor of the root term occuring {os B is not greater than 


(48) 


17 Nonlinear Regression 


258 Chapter 3. Models with errors-in-variables 


Thus for d,, > 0, 


es Bay « (49) 


/ 


The other cases are treated in the same manner. 


3.2.2.4 Measurability and uniqueness 


At this point we will deal briefly with measurability of MLE and WLSE. These 
are obtained under weak assumptions in the general context of minimum- 
contrast estimators (cf. Pfanzagl, 1969; Strasser, 1973; and the literature 
mentioned there). It is sufficient that 1, is continuous in both the arguments # 
and z and for every z the minimum is attained for a finite parameter value. 
Even if the estimate } is not unique since several minima exist, we can choose 
one of them to obtain a measurable function of z (cf. Witting and Ndlle, 1970, 
3.32). The notation \ 


} = arg min, 1,(9) 


should also be understood in this sense — as a measurable choice of the mini- 
mum. Anyhow, in practical cases the assumptions of the general theorems can 
mostly be taken as fulfilled. For this reason we will not go into further details 
with respect to the question of measurability. 

The uniqueness of LSE for very general curve-fitting models has been shown 
in the fundamental paper of Pazman (1984). The proof is based on arguments 
from differential geometry and therefore beyond the scope of this book. The 
methods for previous less general results by Héschel (1978b) have been ex- 
tended in another direction. They are used to show the global identifiability 
of the structural parameter in most practically applied models (cf. Theorems 
3.1.4 and 3.1.5). For LIFU*, measurability and uniqueness of MLE follow 
directly from their sepresentation as solutions of certain eigenvector problems, 
as can be seen in the following section. . 


3.2.3 Linear functional relations with nonrandom experimental design 
and known covariance 


3.2.3.1 The model 


In Section 3.2.1 we showed that a unified approach to bivariate LIFU is 
possible. At least for normally distributed §; this possibility would also exist 
for the multivariate case. The method described in Section 3.2.1 for the com- 
putation of stationary points of the maximum likelihood equations can in 
principle also be used for the multivariate case. However, until now muiti- 


3.2. Maximum likelihood estimators 259 


variate LIFU~ has not achieve the importance of corresponding LIFU*. 
This is, of course, caused by the more complicated computations necessary to 
determine the MLE even in the normal case. The basic knowledge on diffi- 
culties of this model was already provided by the bivariate model (cf. Section 
3.2.1). Moreover, in contrast to the case of univariate §; the assumption of a 
multivariate normal distribution for the experimental design is practically less 
important. As already indicated (cf. equation (3.1.50)), it is difficult to define 
the distribition z = w+ § for nonnormal uw. That is why literature on multi- 
variate LIFU has nearly exclusively treated LIFU with nonrandom experi- 
mental design. Hereby the MLE derived in Section 3.2.1 for LIFU* result as 
special cases. 

LIFU* with known covariance are treated in this section. The assumption 
of a known covariance is not satisfied for most practical applications, to be 
sure, but the corresponding investigations provide devices for the construction 
of two-step estimators. Here we will also deal with the case of a general co- 
variance Dz = 2 € Mn. In difference to the case of uncorrelated single ob- 
servations, Dz = I,, ® 2, time series and other models can be covered. 


3.2.3.2 Least squares estimation 
Consider the LIFU* (cf. (3.1.35)) 
ea An ee ee eka, ,, Ds =a. (50) 


With the distribution assumption § ~ N(0, 2), the MLE and WLSE (with 
weighting matrix 2+) coincide as shown in Section 3.2.2. They are obtained 
from 


mink,(u) = min min k,(u) (51 
HELE yg L£E<p—q uel” ; 


with 
k(u) = lle — Auli. 
There exists an inner minimum for fixed # € C<,_, at the point 


f(L) = Q-2. AML) = Pgsng gn Q-U*2. (52) 


As the projection continuously depends on f and since Y<p-, is compact 
according to Dieudonné (1976, 16.11.9), we have the following result. 


Theorem 3.2.2 For the LIFUt, the WLSE is obtained from 


k,(w) = |lellg-1 — max ||u()ll7,, 
Lel<p—q 
(The WLSE for a special unknown error covariance with LIFU* will be given 
in Theorem 3.2.9). 


thgfe 


260 Chapter 3. Models with errors-in-variables 


For the case Q = A ® 2, A = U' & I,, a reduction to an eigenvalue prob- 
lem is possible, then we have 
Pong gn = Pyanyex-ng = Pasay: © Prang. 
But because of ||(4 @ B) z||} = tr (B’Bz(A'A) 2’) with Z = z we have 
IWa(£)|? = te (ZY2P yn gEAPZAAPP pay AUZ!). 


Now let 6 = (€,, -.., pq) be an orthonormal basis of »12F, Then the WLSE 
is obtained by the solution of the problem 


max tr (E@’2-V?2Qd-1/2) 
C'C=Iy-g 
with 
@ =Q7.y = ZA2U'(UA1U')1 UA’. 


The solution is well known (cf. Rao, 1973, I, 1. f. (iv)) in case that 7(Q) = p — q. 
Then we have 


C = c* 
where 
eC eC ire (Gs an Can Cac genio) 


and the C; are the eigenvectors of Q belonging to the eigenvalues 
AQ) S2-S4,(@); Q = S-U2Qr-1W2, 


Now let n = p. With 7(P 424) 2 p and consequently because of the absolute 
continuity of the distribution P*, the random matrix ZP 4-12,Z' has full rank 
almost surely and distinct eigenvalues (cf. [A 3.14]). This implies that 
L (Cp, =.+5 Cgi) 18 uniquely determined almost surely and in the model LU pog 
the maximum is attained almost surely for r = DRE 


Theorem 3.2.3 For LIFU* with A=U'@I,, n2=p, Q=A@Z the 
GLSE for the structural parameter £ € Le,_, 1s almost surely 


He —- DIE (Oy DOr) Cosi)» 


where Cp, ..-, Cai, are the eigenvectors belonging to the p — q greatest eigenvalues 
Of, Sah ?2 ACME ER eta gtee een caae, P? is uniquely determined almost surely. 


The value of 1, (A) is 
: | 
log 1,(2) = OP log 2 = & los det [A] det fst = AQ). 
2 Py 2 j=qt1 


For this compare Remark 3.2.4 in Section 3.2.5.) 


3.2. Maximum likelihood estimators 261 


The fact that, for the structural bundle RQ, 4, the WLSE ? has almost surely 
the largest admitted dimension, leads to inconsistent estimates in models in 
which the ‘true’ structural parameter indeed has a smaller dimension than 


p —q. Furthermore, notice that Q is closely connected with the regression 
model 


Z=MU+E 
Hereby the BILUE of & is 


~ 


M = ZA-\U'(UA1U’)1 U, 
so that 
Q = MAM =Q,y 


and with the matrix of the regression residual Sz., it holds that 
Qz.u = S3z.y —ZA4Z’. 


Thus the results on the theory of multivariate linear regression form the basis 
for the treatment of LIFU*. (This is similar to the case of unknown covariance 
matrices.) 

For completeness let the MLE be given for the model (3.1.34). Due to results 
in Section 3.2.2, the MLE for f+ is 


go O. 
Now apply 


(LU27) 4 es Dy Supiyenl 
and 
R(C*)* = RC,) 


Since the eigenvalues are almost surely different, one obtains the following 
result. 


Theorem 3.2.4 For LIFU* 
2=(U'@i)ntt, Doe—A@z, 
iO, 1 Dt ete 
the WLSE is obtained for L+ in the form of 
f+ = 5-120,, 


where Oy = (Cy, --+) Cy) are the eigenvectors belonging to the q smallest eigenvalues 
of S-¥2QS-12, With L any other matrix, L € Mix. is WLSE of KL+) = KL). 


Nevertheless, R(L) is almost surely uniquely determined. 


262 Chapter 3. Models with errors-in-variables 


We have 
log L(A) = ay log 2x — — 5 lee det [A] det [2] — =. s A(Q 


4=1 
This also holds for the model 
Eb” € Oregtpxa: 


Notice that + = G, if 6, consists of the eigenvectors of (Q — 1,2) q% = 0 
belonging to the smallest eigenvalues. For general covariances a computation 
is not so easy but is possible in principle due to Theorem 3.2.3. MLE then have 
to be computed with the algorithms for general models (cf. Section 3.2.6 and 
3.2.7). The following theorem for the LIFU* with £ € 2=,-, helps to shorten 
the computation since the dimensions less than » — q need not be considered. 
The theorem is formulated for WLSE. 


Theorem 3.2.5 Let P* be an absolutely continuous distribution, n = p(p — q). 
Then the WLSE ? of £ € Say-q is almost surely contained in B_ pq 
Proof. Assume a WLSE # € &,,7r < p — q would exist. Then there exist some 
subspaces /,, £, € 2,., with ¥ —F,,£ —f,, F, + F£., such that 


he (f; n £,)". 


Since f should be the WLSE for the whole model Y,-, it follows, taking 
into consideration # < f;; 7 = 1, 2, that 


ued; uel” 
Because of that, and since 2.,_ -9 = Lup» both #, and #, would be WLSE in 
Q_,_¢- Hence, because of w € (£, 9 £2), the WLSE A(z) for w would not be 


identifying for the structural parameter f. 
This can be the case only on a zero set of 2 (cf. Hdschel, 1978 a, theorem 6.2). 


Theorem 3.2.6 Let any distribution from P* be absolutely continuous. Then 
the WLSE for # € S<y-, are obtained as Lf = KL+)+ where 


palettes)” fe sim fal ech=) 


He xa 
holds and fi(£) is defined according to (52) (cf. Nussbaum, 1976, theorem 4.3.1; 
Hoschel, 1978 a). 


The use of this corollary consists in the fact that the minimum problem does 
not have to be treated for all matrices L+ from the sets M,,,, 7 2 q, but only 
for these with r = q. Thus it is possible in principle to compute WLSE ? but 
in the case Q = A @ ZX the computation of WLSE is difficult. 


f 
3.2. Maximum likelihood estimators 263 


3.2.3.3 Hquivariance 


The equivariance of WLSE can also be proved for general covariance Q. 
First define equivariance for the case A = U’ © I. Let § < Me xp be a group 
of regular linear transformations on IR?. Under G@ € & the LIFU* model trans- 
forms from (cf. (3.1.38)) 


Z=MU+S, DS=2 
to 
GZ=GMU+ 6 
or 
Z=MU+E, DE=G4,26,,, Gn =In ®G. 


Now it is sensible to demand that the estimate correspondingly follows that 
transformation G over the structural space. Above all, that is the case for 
models which are formulated coordinate-free with geometrical invariant 
terms. In that case equivariance would be defined by 


where the index 2 expresses the dependence of the estimate on the model 
parameter 2. This definition is especially obvious for the case A = U’ & I. 
For general A € Mnpx np» m = n, the following commutation property has to be 
assumed : 

A) GP, OG) A, VGES. (53) 


Then the following theorem holds. 


Theorem 3.2.7 The WLSE are equivariant for every group & of regular linear 
transformations with the commutation property (53). 
Proof. Due to Theorem 3.2.3 the WLSE of the transformed LIFU* are solu- 


tions of 
| P5-r0 gen O22I[” = Ppa 4 gnQ¥!22|2, VE € aa (54) 
Now it holds that 
B71 = (G,,26,)-) = W014), 
(55) 
7 amet OPA a CoN oe Gr. 


Then, with the commutability (53) the well-known representation of projectors 
P, = L(L'L)" L, R(L) = £ for all # € &,-, implies 


| Bote ge 2 tz = |Po- agri enQ-122||? (56) 


264 Chapter 3. Models with errors-in-variables 


or, similarly, because of G)1£" = (G-1L)", 


[Prnalead Met PParn ash 7 


for all £ — R? with # = G-1Q,_, for at least one £ € &, 4. But, the last set is 
Q,-q itself. This implies 


Po = GIL 5. Pa 


3.2.3.4 Linear functional relations with nonrandom nonobservable variables 
and linear regression part 


Concluding this section we still remark on the computation of WLSE in general 
LIFU* with linear regression part (model (3.1.38); consider the identifiability 
condition of Theorem 3.1.3). Then we have 


k(u, w) = min min |e —(U’ @ Ip) wu —(V' @ Ip) wl (58) 


eMeQe_ | uci, 


For 2 = A ® & the inner minimization problem is an ordinary multivariate 
linear regression. It follows that 


A! = P prnyry yn QZ, (59) 
consequently 

jue) = M,(z, ph, Q) = D-U2Z A-1V"(VA-1V")-2 
with 

DTM ee 


and thus, in a compact way of writing (cf. 3.1.3), 


k,(a) = min ||2 — (U' @ Ip) wo || (60) 
‘ wegr 
with 
Z=Z—ZAV'(VAY')-1 =: Z/V 
and 


U = U — UAV'(VA-1V')! =: UV. 
With this new model we can proceed as above. (For further modifications of 
A, and A, one proceeds analogously (Hdschel, 1978 a).) 


3.2.4 Linear functional relations with nonrandom experimental design 
and covariance known up to a factor 


As explained in Section 3.2.1.3 the likelihood function can be unbounded 
for LIFU* if no further restrictions are set on the parameters. In these cases 
no MLE exist and MLS yield inconsistent estimators in general: In bivariate 


3.2. Maximum likelihood estimators 265 
et ce TE SS er ral are ale aa a a oe hd 
LIFU* it is not sufficient to know one variance ratio; only with two known 
variance ratios one obtains MLE. Thereby we get LIFU* if o;/o, = 0, or 
o; = 0. Then the knowledge of the other ratio 05/0, is equivalent to the know- 
ledge of the covariance up to a factor. In case of general multivariate LIFU*+ 
this additional information is also sufficient for the boundedness of the likeli- 
hood function, as will be shown in the following. But this does not imply the 
consistency of MLE. Whereas consistency of MLE holds for the structural 
parameters, the corresponding estimates of the covariance factor are incon- 
sistent. For bivariate LIFU* this is to be found in Kendall and Stuart (1961, 
29.19), and for multivariate LIFU* in Gleser and Watson (1972). 

Now we show boundedness of the likelihood function and then we will give 
the MLE. We consider LIFU* with linear regression part (cf. (3.1.38)). Then, 
for DOE = o-I,o€ R> we have 


I.(u,0) & o®? exp (—lle — Apl|2-s/2o} 
with 
Kan) = [M4(n,)1 Mn,)2] € aK IRES 


ee Ae (AG | Ae Wenn 68 


According to Theorem 3.1.3, £ is identifiable if r14] = np. To prove bounded- 
ness of 1, it suffices to show that the vector norm occurring in1, has a positive 
lower bound. 


1. case: r[A] < mp 


This case describes replicated measurements for A = U’ ® I,, U’ = Diag(( Taine 
If r[A] < mp then at last one m; > 1. 
Thus &(A) is a proper subspace in IR?”. But then we have 


je — Aliza > lle — PP2/3-+ = 0. 


2. case: [A] = mp 


Because of 7[U] = n = m, the matrix U is regular and thus z = Ay is equi- 
valent to 2 = A-lz = wp. 
But 

= [Meni Menges Mey © £5, LE Sapa? 


According to [A 3.14], for n, = p — q, the first subvectors of 2 are almost 
surely not contained in any subspace £ € <p, since with P* < /?™ we 
also have P* < 4?™. Consequently 2 is almost surely different from yu and 
according to [A 3.14], for w € £" X IR", £ € Lapa 


max 1,(u, o) = max max 1,(u, 0), 
L,o u o 


266 Chapter 3. Models with errors-in-variables 
where the maximum is attained for 


A 


1 
6(u) = a lle — Apllg-. 


The following maximization of 


1 


can be carried out as described in Section 3.2.3. As special cases one obtains 
bivariate LIFU* and the cases 


A= (U's LAO uae Gs (cf. Nussbaum, 1976, 5.) 
with DO = of, 6.2 (cf. Casson, 1974) 


A=I1,,®Ip, M2 =9 (cf. Gleser and Watson, 1972). 


3.2.5 Linear functional relations with nonrandom experimental design 
under independent normally distributed errors 


Once again we will immediately treat the general LIFU* with linear regression 
part. The representation is based on Anderson (1951a, section 2). The model 
is given by (cf. (3.1.38)) 


Z=M,U+M.V+S, L''M,=0, 
(61) 
Q€T = (In @ Z| TEMP}. 


WO was treated in Section 3.1.4, Theorem 3.1.3. According to this 
we have r((U V]))= 2. 
We put M = (M, 1 M.), W = [U; VIE Maxims % = Ny + Ne (cf. Section 3.1.3). 


As distribution model we assume 

5 = Sim OM = {N(0, 2) | Qe T}. (62) 
For Q = A @® &, by transforming the model with A-1/2, ie. 

Z := ZA-'2, 0 := UA?, 
one can obtain that the transformed error covariance is 


Q=1, @ =. : (63) 


3.2. Maximum likelihood estimators - 267 


For unique parametrization of the space R(L) = f, only such matrices L+ 
are considered for which 


bs S15. (64) 


With Z+ also L*G is in the admitted parameter region for all orthonormal 
matrices G € My. 

To obtain the MLE all stationary points of the likelihood function are to 
be computed. The MLE is among them if the likelihood function is bounded. 
This is shown as in regression theory. 


Theorem 3.2.8 If P* is an absolutely continuous distribution, then 
k,(M, =) = —n log det [2] — tr 2-182’ (65) 


is almost surely bounded form —n 2 p, M EMyxm, XE Ms. 


Proof. Because of [A] = 7[W’ ® I,] =n < m, &(A) is a proper subspace in 
IR", and because of Q-/24 = W’ @ 2-1? it results that 


|2-M22 — Q-VAul = (LZ — Pony) Q-1?2|/? 
= |(Py @ 1) (LZ @ 2-1). (66) 
Vor 2 := (Py 2-12) z, ee 2-123 Py1, holds and thus 


max k(M, Z) < —m log det [2] — tr 2-3/2 Pyiz’ - 
M : 

Because of 7(Py-1) = m — n ,it follows, according to [A.3.14], that S := ZPy4 

xZ' €M-> almost surely. Consequently, according to [A 3.15] we have 


max max k(M, 2) < —m ices det [S] + «< ow. Hi 
m 


= M 


The boundedness is not valid for 7(W) = m = n. This has been shown by 
Anderson and Rubin (1956) and Solari (1969) in their results on the unboun- 
dedness of the likelihood function for bivariate LIFU*. Examinations on sol- 
vability of ML-equations for the multivariate LIFU* with linear regression 
part are to be found in Florens et al. (1976) for the case m = n, and in Willassen 
(1979). For m > n the MLE can be obtained from an eigenvalue problem similar 
to that in Section 3.2.3. 


Theorem 3.2.9 (Anderson, 1951a) The MLE for LIFU* with linear regression 
part result from 


She = Cy = (C4, ..., Cy) € Moxgs : (67) 


268 Chapter 3. Models with errors-in-variables 


where the c; are eigenvectors from 
(Q =m AS) CE = 0, 
S cam Sz.w> 


Q=0,¢, T= UI —V'(VV')-1 V) = UP,,- 
witht 


D= Diag (A, ae) Ap)» Ay S200 Ape C= (C;, C*) € Mx: 


The eigenvectors are normed according to 


C'SC =(I,+ D)y*, 8 =Sjm. 


(68) 


(69) 


(70) 


(71) 


The p —q +1 greatest eigenvalues i; are almost surely different. With L+ any 


orthogonal transformation is also MLE of Lt, if it satisfies (64). 
With 
D, = Diag (A,; <s.544) 


3=8 + SL(1, + D,) DL'S 
M, = (1, —St+L') M,, 
M, = 8,757 
holds and we have 
M, = (Szv — M,8py) Sy’. 


The maximum value of the logarithmic likekihood function is 


q 
(M1, S) = —“P + AP hog 2n— = log det [8] — = ¥1 
(M, 8) = —"P +P log 2n— F log det [8] — “FY log (1 
Proof. For the logarithmic likelihood function it holds that 
ae 1 
LAM, 2) = log 2% — = log det [2] — BR tr 5-10’. 


In connection with the restrictions one obtains the Lagrangefunction 


i=14 tr(AM{L*) + - tr(B(LY'EL+ — 1)) 


(72) 
(73) 
(74) 
(75) 


(76) 


+ 4i). 
(77) 


(78) 


with Lagrange multipliers A € 7.,,, B¢ Mi. With the partial derivative 


with respect to Lt, 
AM), + BL'S=0 


(79) 


3.2. Maximum likelihood estimators 269 


follows for the stationary points of J, and because of (63) and (64) we have 
AM{L* + BI SL = B=0. (80) 


Now we assume Syy = 0. Otherwise, instead of the initial variables we apply 
transformed ones: 


G=UI—P,), MM, = M,+ M,S8yyS>". (81) 


Then we would havef = E = Z — M,0 — M,V. Because of (80) and Syy = 0, 
the partial derivatives of lz with respect to Y, M,, M,, and L+ lead to 


m& —~to' =0 (82) 
WS eM oa ed = 0 (83) 
Z-1Szy — T-1MSy = 0 (84) 
AM, =0. (85) 


‘The solution of (84) provides M,. From (83) one obtains M,, since the multi- 
plication with 2’ and 2" provides, with (64), 


A = LI"'S8zy. (86) 
With that, from (83) one obtains 
M, = (I — ELLY) 8qySG, (87) 


and after replacing variables according to (81) this implies equation (74). 
Because of (86), (87), and from the definition of Q one obtains, with (85), 


(a LL) OL = 0, (88) 
Then (88) implies 
mS = 8 + SELYQI4+L"S. (89) 


Within the set of matrices admitted for Z+ we choose — if necessary after 
orthogonal transformation — those ones with 


PAP uOnue fs Dine tia, 1): (90) 


m 
Then (89) implies 

mS = 8 + mSL:DL"'2. (91) 
The multiplication with Z+ provides 

mZL1(I — D) = SL". (92) 


270 Chapter 3. Models with errors-in-variables 


With that (88) implies 

QL — mZL!:D = 0 (93) 
or 

QL!(I — D) — mZL+(I — D) D = 0. (94) 
Hence (92) implies 

QL! = (Q+ 8) LD. (95) 


Consequently Z+ consists of q of the eigenvectors ¢,,...,¢, of the eigenvalue 
problem 


\Q —4(Q + 8)| = 0. (96) 


These eigenvalues are less than 1, since Q + S =Q. The relation between 
the eigenvalues of (8) and (36) is 


Dy, = DI, = Dy". (97) 


According to [A 3.14] the eigenvalues of (68) are almost surely different if 
m > n. Thus, for arbitrary eigenvectors O = (é,,..., é,) of (68) it holds that 


O'S = Diag (hy, ..., ky) (98) 


with certain constants k; > 0. We choose k; = m, j = 1,..., p. Then, for a 
certain matrix K we have: 


~ 


I'=6,K, K=Diag(k,,...,4,), 6, = (&,.--,&)- (99) 
Because of (92) and (64) the relation between K and D implies 

mI, — D) = L+'SL+ = KC’. 8C,,K = mR? . (100) 
that is, 

RK = ji, —B. (101) 


Now we show that the q smallest eigenvalues of S~!@ have to occur in Dy. 
Due to (91), (92), we have 


mz = S+ =A SLA. — D)-1 DiI -- D)-+ L's (102) 
m 
1s a ae 
=S+ am SC,(I — D) DOS. (103) 


From (98) and (101) we obtain 


OSE =1 +4 [1,10] DU — Dy (1,1 0). (104) 


1 


3.2. Maximum likelihood estimators 271 


Because of (98) it further follows 
a ie e 
det [2] = det [C]-? J] 4,(1 + 4,). (105) 
j=1 
With that, according to (97), (98), we have, as likelihood function, 


1,(M, 5) = — “Flog 2n— ~ log det [8] — ~ Silog tian ve 
k=1 


(106) 


For every choice of the 1; , k = 1,...,¢, this provides the logarithms of the 
likelihoods over the stationary points of J,. Since 1, is bounded the maximum 
is just attained for the g smallest eigenvalues A, ..., 4,. Finally, the obtained 
estimates are transformed according to (81). i 


Remark 3.2.2 The covariance estimate 2 is not always consistent (Villegas, 
1961,for g = 1): 


Remark 3.2.3 There is something obviously common in the Theorems 3.2.4 
and 3.2.9 that is more profound. In fact, for a general linear time series model 
it was shown by Robinson (1974) that with the estimates constructed in such 
a way from the above eigenvalue problem, every eigenvalue of the residual 
matrix €¢’ is minimized. In a unified way this implies very elegantly the Theo- 
rems 3.2.4 and 3.2.9 though with less elementary devices: 

In Theorem 3.2.9 the matrix Q can be given in a still more simple presen- 
tation, providing in particular, a device to explain the relations between econo- 
metric estimation procedures and the MLE. If Sy, W = [U V], is nonsingular, 
it holds that (Laha, 1957): 


Qz..viv) — 92.0 = 82.0.0) 7= Szu.vScvSuzv- (107) 


This reflects the relation between the projectors Py, Py, Pw. For singular 
models (that is, if Sy or Sy or Sw are singular) this decomposition formula 
holds only under additional assumptions. For this R(Sy) = R(Sy,v) and R(Sy) 
= R(Sy.y) are sufficient (Héschel, 1976). That can be graphically interpreted 
by the exogeneous variables U and V. 2(Sy) = A(Sy.y) means, for instance, 
that the largest correlation between U and V has to be less than one (Héschel, 
1974, 5.1). 


Example 3.2.1 For LIFU* with replicated observations we have 


PEED Vie 


U = Diag (1;,,); t= 1,...,n;m = > m;. 


t=1 


272 Chapter 3. Models with errors-in-variables 
Then U = 0 and Py = Py») = Py: holds. This implies 


S = S7w — 87:0 =, (24; ——<e4)) (zi; — 2;,)' = Wz, (108) 
Laas 


y] 


_ 


Q=A2.u aS) mi(z;, — Z,.) (%. — 2.) = Bz. 
w=1 


Let 6 = (€,, ..., €,) be those eigenvectors of 


(Bz —1,Wz)t;=0, Oi<¥,--- <A, (109) 
with 

Ow, C=ml,, 6 =(6,,6*), (110) 
and let 

Reo (O7 Dye 0* (111) 


Then we have 


and 
i+ = 6,. (112) 


Remark 3.2.4 Starting from Definition 3.1.5, the question arises whether 
the MLE of B can be obtained also for explicit LIFU* instead of the implicit 
LIFU* considered up to now. That can be expected since for bivariate LIFUt 
with the parametrization « = [1, 6] &, only the 7-axis is not directly comprised, 
though it is obtained as special case for B = oo. In the general case eventually 
after permutation of the components of the z; and w;, respectively, let the first 
p — q components of ; be the independent variables &;, i.e. w; = [Ip-q B} &;. 
Consequently if the distribution of the smallest ¢ eigenvectors C, is not ‘patho- 
logical’ the (¢ X p)-matrix will be regular. By the symmetry of the problem 
this, of course, also holds for any other selection of g rows from C,. Now, to 
any QY = Q(2n)) and S = S(zn)) (cf. (68), (69)) there belongs exactly one suitably 
normed (p X p)-matrix C of eigenvectors. 

Under the decomposition C = (C,,C*), C, = (es) Cyl, Ox © Moy g we 
denote the set of those z,) for which the matrix C, is not invertible by Z. 
With that we have P(Z) = P({C | det [C,] = 0}). Now the determinantes 
are analytic functions of the elements of the matrix C. Furthermore, take into 
consideration that the orthonormal (p X< p)-matrices form an analytic mani- 
fold (cf. James, 1954, S. 43). Thus the elements of C' are analytic functions 
themselves within the local coordinates of this manifold. But the set of zeros 
of an analytic function is a null set (cf. Fisher, 1966, theorem 5.A.2). According 
to Girko (1975, corollary 4.3.1 (2)), it finally results that P(det [C,] = 0) 
as an integral of a certain density function with respect to the Haar measure 
over the just described null set of orthogonal matrices. With that, Z itself is 
a null set with respect to the distribution of Z,,). 


3.2. Maximum likelihood estimators 273 


Example 3.2.2 Now we consider explicit LIFUt, £ = A([I B)), Bem 
Then we have 


4X (p-q)* 


KC,) = A(L+) = K[—B' i I,)). (113) 
Hence, for a regular transformation F € M,, 

0,F-1 = [38 tdale (114) 
Therefore, with the partition | 

Cee Opry in Og Macy (115) 
one obtains the equation 


Be —C,Co! (116) 


According to Remark 3.2.4, C, is almost surely regular. With the natural 
decomposition of Sz yw into Sy ySyxy.w, etc., it easily follows for the column 
vectors of C, and C,, that 


¢; = —(Qyv.u.v — ASy.w) 4 (Qrx.u.v — ASrx.w) & (117) 
and 
¢: = (—Qy.u.v — AiSx.w) 1 (Qxv.u.v — ASxv.w) Ci- (118) 


With that, for g = 1, (118) implies a known formula for the LIML in linear 
simultaneous equations which have only one structural equation: 


6, = —(Qx.v.v — ASx.w) + (Qxv.u.v — ASxy.w). (119) 
In general, no such simple explicit formula holds, only 


Bee {((Qx.u.v —A Syx.w)* (Qxy.uv — 4Sxyv.w) G;) dnt C,*: (120) 
The generalization for v > 1 is to be found in (3.3.29), (3.3.30). 


3.2.6 Nonlinear models with known error covariance 


Though in practical applications one can seldom work with known error co- 
variance, such models are of fundamental importance also in cases with unknown 
covariance. Often one will apply two-stage estimators, that is use a suitable 
estimate for the unknown covariance and then proceed as with known co- 
variance. The basis is a model with nonrandom experimental design 


O=s(ui,2), O= p(x), (121) 


= fit op Or emara On 


18 Nonlinear Regression 


274 Chapter 3. Models with errors-in-variables 


with the distribution assumption 
Cin © P@ = {P, | HO = 0, DE = 2} (122) 


for fixed and known covariance matrix 2 € M7,,. According to Section 3.2.2, 
with given estimation functional /,(-) one obtains estimates #, « from 
[A, #] = arg min 1,(u, 2). (123) 
O=8;(43,%) 
0=p() 

Such extremal problems can be solved iteratively only on sufficiently large 
computers. Modern derivative-free descent methods secure convergence of 
the corresponding algorithms (cf. Schwetlick, 1979). For quadratic objective 
functionals — that is, in particular, for WLSE — one even has global con- 
vergence to a local minimum. But, in comparison with general nonlinear 
minimization problems the special form of the present model implies a special 
structure for the known iteration methods. These will be described in Section) 
3.8. But, for every iteration procedure we always need a suitable initial ite- 
ration for u and az. The coice is difficult. The most reasonable and hardly 
replaceable initial approximation for s,,) 18 2»), or for replicated observations 
Zn). Such a ‘natural’ initial approximation does not exist for z. But this pro- 
blem is not to be treated further here. Egerton and Laycock (1979) showed that 
even with that initial iteration one of the best-known iteration procedures can 
not converge to the global minimum. To be sure, in practical applications it 
is already valuable to have an improvement of curve-fitting in comparison 
with the initial estimation — as it would be in reaching a local minimum. But 
if such a local improvement does not suffice, the global minimum has to be 
determined. 

For this the method of Lagrange multipliers is suitable. The global minimum 
is under the stationary points of the Lagrange function with arguments 
D = (Uns %, A, %): 


lag (8) = Ue(y1, 2) + A's (tts 2) + x'p() (124) 


in case that it exists and the functions are sufficiently smooth, where 2, x 
are vectors of Lagrange multipliers. The stationary points result as solution 
of the following system of equations: 


O = yb, + WN On8(ny + #' OnP s 
O= 0), + 1 0,89); (125) 
0 = %m(u,2), 0= pln). 
For explicit models the stationary points result from the equations" 
0= OM wlE (ns I); mn) , 


(126) 
O = ,,1(m(E(m, %)> 2). 


3.2. Maximum likelihood estimators 275 


In general, such equations are as difficult to handle as are minimization pro- 
blems. They also have to be solved iteratively. For polynomial models which 
are of great practical importance, one can fall back on more precise procedures 
to define the roots. That is why it is of interest to describe more precisely 
these equations for important special cases and to show simplifications. 

Starting from the general equations we want to give their form for the com- 
putation of WLSE. In this case 


L(u) = |lz — pllo—n- 


With that, for explicit models the following system of equations is obtained 
(cf. Britt and Luecke, 1973): 


z= pt QA SA, 
O = 04 5(nA + Onpx, (127) 
0 = sm(u,2), 0 = ple). 
For explicit models with » = 0 a more simple form follows: 
=a, OU iG, ot) = O'0, Oar i(E is) 110. on (128) 
0 = 2z,,U(u(é, x)) = C’Q-1 Diag oa: 8;,ri(€i, 2)])- 


In a slightly differing form these equations are also to be found in Dolby (1972) 
ford, = 2. 

Concluding this problem, we deal with the case of independent replications 
of observations of a fixed experimental design since it is of outstanding prac- 
tical importance. Let 


Zig = Bit Ciy | aed Ohare LF 


(129) 
Dei; = Dine 
Then (cf. Rao, 1973, 8a, 5.4) we have 
L(u) = > > ley — ells 
i=1 j=1 
= |2(n). — “lla + DY tr (2;78)) (130) 
i=1 
with 
Q = Diag (2;/m;) 
fee (131) 


18* 


276 Chapter 3. Models with errors-in-variables 


and with the natural partition of 27+ into 2;, etc., and with 6;. = (&%. — &i), 
&, = 9. — ri(éi, 2), one obtains 


Ym, O47 (2;, + L8e;,.) = 0, 
= (132) 


(mi(2ers( EH, + 2,8) + 28s, + Z88;,)) = 0. 


It is practically and theoretically important that not only the true experi- 
mental design is identifying, but also its estimator, at least for almost all obser- 
vations. Of course, this depends mainly on the distribution of observations, 
but also on the estimation procedure and on the structure bundle. Moreover, 
this identification property is not only important for the minimization esti- 
mator itself. 

Practical computation of minimization estimators is always performed 
iteratively. But no iteration procedure can yield practically utilizable conver- 
gence to the global minimum — the desired minimization estimator. However, 
modern iteration procedures provide convergence at least towards local minima 
for arbitrary initial estimates. In comparison to older, less effective algorithms, 
‘this is an improvement because the latter converge locally, but in general they 
can oscillate or diverge. That is why, beside the minimization estimator itself, 
all other critical points of the estimation functional should uniquely deter- 
mine a state surface as well. The state surfaces belonging to different critical 
points may be different. If this is the case, then, in general any critical point 
approached by an iteration procedure will determine exactly one system para- 
meter, which we will take to be the estimator. Otherwise, if there were two or 
more state manifolds containing the estimated design, difficulties would arise 
in the interpretation of the curve fitting. 


Definition 3.2.1 Let the estimation functional 1,(-) be given. For almost all 
observations zy) let all of the — possibly different — l-minimizing estimates f(z) 
be identifying for the system parameter. Then we say that the structure bundle Sy 
has the property of identifying extremal points w.r.t. the estimation functional 1. 
If this holds for all critical points (not just local minima) of the estimation func- 
tional, then Sy ts said to be identifiable by the critical points of 1. 


Consequently, after the description of identifying experimental designs in 
Section 3.1.4 it must be a further aim to give practically checkable conditions 
to the system equations, which secure this property. It is to be seen that, essen- 
tially, these are the same ones that are already valid for the identifying experi- 
mental designs. For ‘genuine’ errors-in-variables models in which at least one 
of the ‘independent’ state variables is observed with errors, the following 
statement is obtained: if different state manifolds have no weak contact of 
infinite order and » > 2 dim JJ, then almost all observations 2,) provide 
WLSE A(z) of the experimental design, which identify the structural para- 
meter (cf. Héschel, 1978b, 1986). 


3.2. Maximum likelihood estimators QT 


~~ 


3.2.7 Models with unknown error covariance 
under normally distributed errors 


In such models the MLE result from the minimization problem described in 
(42), (43). The MLE is obtained for z = 2m) = (zi); with i= 1,...,2, 
fj == V,oen, My {rom 


L,(?) = min min min1,(u, x, Qy)). (133). 
) mE] BESy, =ver 

Now the minimization over 2 can be carried out for several important special 
cases. As it is to be seen the resulting estimates Q = Q(z, u, z) are determined 
uniquely. In those stationary points of log 1,(y) which provide the minimum Q 
has then to have exactly this value. On the other hand, the stationary points 
of log 1,(y) are obtained from the same normal equations as in Section 3.2.6, 
where an additional equation still occurs for 2. But the latter can immediately 
be eliminated by inserting Q. 


Theorem 3.2.10 The normal equations with unknown covariance are obtained 
from those with known covariance by inserting the estimator © in the latter. I 


Now we describe more precisely 2 for important models, starting with the 
practically less important. 


Model 1. Independent replications of the observation of a fixed experimental 
design, with 


Q2=1@ 2, @ = (2;)ja1... bo ieee = (Cie lee 


(134) 
CD gen Oe | heginer ew Be 
One obtains 
1, = k log det [Q] + tr Q-8¢, 
ke (135) 
Se = Lem; — #) mi — )’- 
j=l 
Then we have (cf. [A 3.14]) 
Q = Sz/k, (136) 


if Sis regular. This is almost surely the case for k > np. The most important 
case in practice is the following. 


Model 2. Independent errors over different experimental points with different 
covariance: 


C= Ding lat @ 2) ie Te Mg ty aya ys X,))~¢ (187) 


278 Chapter 3. Models with errors-in-variables 


Then we have 


k — > (m; log det [2] Si tr 2; Si), 
a (138) 


Si = Li (iy — mi) (iy — mi)’ 


n 


It results (cf. [A 3.14]) 
De => S;/m;, (139) 
in case that m; > d;. 


Model 3. Independent observations with equal covariance: 


= Ts, m= ym, 
(140) 
SeM eek == A 2 ae 


Then (cf. [A 3.14]) 
S=Sim, S=D (ej — mi) 2; — ma)’, (141) 
4) 


in case that m > n > d,, therefore at least one m; > 1. The maximum likeli- 
hood equations have a somewhat different form if data vectors sorted by the 
variables Z(m) = [2(}?,, ..., 2\?] are used instead of the object-sorted z = (2;),,...m 
(cf. Dolby and Freeman, 1975). 

Finally we remark that with missing replications of observations, similar _ 
problems occur as with linear models. Instead of equation (3.1.64), some in- 
equalities occur which are functions of the solutions of the likelihood equations 
(Dolby and Lipton, 1972, 1.(3)). In general these inequalities contradict the 
consistency of the solutions of the likelihood equations. 


3.3 Further estimation procedures 


For the present the MLE — and the WLSE connected with it — have been 
presented for the estimation of general functional relations since these esti- 
mators are especially suitable for a great number of practical applications. 
Above all this is true for problems of data-fitting in the field of natural and 
industrial sciences. There a fixed experimental design can be observed repeat- 
edly, and in a good approximation the errors of the observation can be assumed 
to be normally distributed and mostly even as independent componentwise. 
In these cases the WLSE and MLE provide consistent estimators which under 
certain regularity assumptions, because of the fixed number of parameters, 
also have the asymptotic optimality properties of the MLE (cf. Section 3.5.1). 


3.3. Further estimation procedures 279 
ieee Pe pease pike ee ie Gok ee Pe oe Ny he 
From the preceding sections it has become obvious that the application 
of MLE is limited. Hereby we do not think so much of the fundamental pro- 
blems of MLE (cf. Weiss and Wolfowitz, 1974), but rather of the many ones 
resulting with nonreplicated measurements, with nonnormally distributed or 
dependent errors or random experimental design. Furthermore, also for models 
in which the MLE is consistent and asymptotically optimal, imperfections 
of the MLE such as nonexisting moments, bias or complicated ways of compu- 
tation give rise to the construction of further estimators. 

Now the present section is to give a survey of alternatives to MLE and 
explain the motives, principal rules of computation, as well as possibilities 
and limits of these alternatives. The alternatives often result from asymptotic 
considerations; however, their introduction will not be based on those details. 
A more detailed representation of asymptotic properties is to be found in 
Section 3.5. Moreover, a very detailed representation of asymptotics for a 
special but very large class of estimations on LIFU* with independent errors 
of observation over the single experimental points is to be found in Section 3.4. 


3.3.1 Linear functional relations with independent errors 


3.3.1.1 Introduction 


For the present, MLE and WLSE provide consistent estimates if the number 
of incidental parameters remains finite, that is, in particular, for a constant 
experimental design. This also holds for more general cases if the sequence of 
experimental designs satisfies specific conditions (cf. Section 3.4). It is often 
possible to get additional information about the model by observing further 
variables. The use of this additional information also yields consistency, if one 
assumes less about the sequence of experimental designs than on the correspon- 
ding initial model for the computation of WLSE. This is especially true for 
models where no replicated observations can be assumed (cf. Section 3.2.1.3). 
Of course, we are also interested in estimates for nonnormal models with 
random experimental design. However, the MLE is hard to compute in such 
cases. For clearness the following representation will be carried through mostly 
for bivariate LIFU. Then, in most cases the generalizations to multivariate 
LIFU are obvious. 


3.3.1.2 Ordinary and orthogonal least squares estimation 


First note that the OLSE and orthogonal LSE (cf. Section 3.2.2) are widely 
applicable because of their simplicity (cf. Section 3.1.2). One has to remember, 
however, that the OLSE is inconsistent and the variance does not exist for the 
orthogonal LSE. For the present we do not assume replicated observations. 


280 Chapter 3. Models with errors-in-variables 


Then it holds that 


Bo a poten = dny|dz 5 (1) 
((d, —d,) + VG, — 4, + 4@,)/2d,, for dy + 0 
B, = Boruss =40 for dy =0 and d,>d, (2) 


for d,, = 0 and d,.< d, 

(cf. Madansky, 1959). 
For observations distributed according to 2; ~ N([é;, « + Bé;], X), the likeli- 
hood function is unbounded (cf. Anderson and Rubin, 1956, p. 130). Although 
there is no MLE as alternative, these estimates can be compared. Using the 
MSE as criterion fy) has to be preferred since 6, — in opposition to By — 
has no finite moments. But to compare the estimates one can use the 
probability of that these estimators will fall in an interval of fixed length wich 
contains the true parameter. This problem will be treated more precisely in 
Sections 3.5.2 and 3.5.3. 

For practical purposes another representation of the ORLSE for bivariate 
LIFU is useful. Namely, if 


g := tan-* p (3) 
then 
tan 2p ,/2 aa Dny|(d x a dy) (4) 


holds (cf. e.g. Malinvaud, 1956). 

Starting from this ORLSE can be modified by minimizing not the sum of 
orthogonal distances but the weighted one. That is, let m; be the weights for 
the observations z;, then the weighted ORLSE (cf. Ware, 1972) By is given by 


Lila; —%) (yi — 9.) 
tan 29y/2 = —>—__—_—_—_——_—_; (5) 
Get) Ye) | 

The essence of this construction corresponds to the step from OLSE to GLSE 
in the regression model. 


If the weightings come from replicated observations, then we have 
tan 2>y/2 = 6b,,/(b, — by). (6) 


These estimates remain consistently and asymptotically normally distributed 
under the assumptions explained in detail in Section 3.4. 


3.3.1.3 Intrumental variables 


The following grouping estimator due to Wald (1940) is still more simply defined 
than the OLSE and geometrically as clear as the ORLSE. It is an advantage 
of this estimator to be simple as well as equivariant. Namely, the observations 


3.3. Further estimation procedures 281 


are split into two groups — 2;;,,7 = 1,2; 7 = 1,..., m/2 for even m — and the 
rl is drawn through the means of both halves of observations (cf. Figure 3.3.1). 
us let 


Be = (Ye — 1.)/(%2. — a ,)- (7) 
Bartlett (1949) showed that for equidistant errorless observed &;, a smaller 


variance can be obtained by omitting the middle third of the observations. 
The condition for consistency (Wald, 1940) is 


lim inf |&,, — &,.| > 0 (8) 


n—>Co 


Fig. 3.3.1 


This condition is obvious. The corresponding sample means for the experi- 
mental design points have to differ for the groups. If the §;; are randomly distri- 
buted, the expectations of their generating distributions have to be different 
for §,; and &,;. All these estimators are a special case of 


A= DY wiyi! Luiwi- (9) 


In Wald’s estimate we have uw; = +1 according to whether z; is from the 
first or second group: 

Now the variables u; must not come from a grouping criterion which is 
independent of the observations. It might also be further variables which 
work in the system described by the LIFU but which do not occur in the LIFU 
itself and which are anyhow connected with the unknown experimental design. 
That means for these variables that they are correlated with the unknown 
experimental design. 

Let the empirical correlation be different from zero: 


fee OF (10) 
n—co 1 j=1 


On the other hand, these so-called instrumental variables (IV) u; are to be 
independent of the observational errors §;. 


282 Chapter 3. Models with errors-in-variables 


That means that, almost surely, 
1 n 

n—>oo 1 j=1 
Then on the assumption of (11) one can easily show that (10) is necessary 
and sufficient for strong consistence of the instrumental variables estimate 
(IVE) By. (Wald’s condition (8) is just equivalent to (10) for two groups.) 
The question whether a variable u could be suitable as IV or not has to be 
decided for each estimation problem separately. In applications from human 
and social sciences such variables are used more frequently than in natural or 
industrial sciences. Nevertheless, instrumental variables can be obtained 
sometimes in the latter. An example is the observation of a moving object 
at discrete time points. Then the ranks 7; of €; are known as instrumental 
variables and, for instance, any of the estimates 


Bee. == (Yr, 7 Yr.) (Xr, mae L,.)> <7 (12) 


can be used or the mean of pug or their median. This problem was investigated 
by Ware (1972). In this model he proved asymptotic normality for some IVE. 
This is treated in Section 3.4 in a more general context. The IVE is consistent, 
but with finite samples there is the disadvantage that every IV necessarily 
enlarges the variance. With errorless measured variables the OLSE is to be 
preferred anyhow. For that reason Feldstein (1974) proposed to combine both 
estimates by 


br =obo+(1—«) by, - OSaS1, (13) 


where the MSE is to be minimized over «. Moreover, a pretest estimate is given 
there: 


Bos if Qed . 
Kal (Gree BY am a 

where 
Q := MSE (6,)/MSE (,,). (15) 


Of course, the question arises whether a ‘more natural’ correction of the OLSE 
could provide a consistent estimate. This is possible if there is an estimate of 
the error covariance niatrix, which is independent of the errors $,,). For in- 
stance, w,(m —-n) is a consistent unbiased estimate of D(§) + D(d), where 
the expression 

D(§) = lim = Sz < 00 (16) 


n—>co 1 
is defined both for non-random and for random §;. Then 
B = (wzy|(m — m) — 65.)/(wz](m — m) — 65) (17) 


is a consistent estimate. 


3.3. Further estimation procedures 283 


3.3.1.4 Use of variance components 


The modification given in (17) is essentially based on a consideration of the 
expectations of the corresponding sum of squares. This procedure was intro- 


duced by Tukey (1951). For instance, it is easy to see that under replications, 
the estimators 


B = byy[(by — we) | (18) 
and 


B = (b, —wz)/Bry (19) 


are consistent. Further estimates of this kind are to be found in the survey 
of Madansky (1959) (see also Doff and Gurland, 1961a, b). 

Besides the estimators stated up to now, there exist further ones for bivariate 
LIFU which are less applied. Among them are the estimators developed by 
Neyman, Scott, Kiefer, and Wolfowitz, as well as the method of cumulants and 
others (cf. Section 3.1.6). - 


3.3.1.5 Limited-information maximum likelihood 
and two-stage least squares estimators 


The consistency obtained with the OLSE by modification causes the thought 
whether modifications of the MLE also yield better estimates. We can start 
from equation (3.2.119). For g = 1, a modification is obtained if the smallest 
eigenvalues of Sz wQz.u.y are replaced by an arbitrary fixed or random con- 
stant 2 € IR?: 


6 = (Qx.u.v — ASyx.w)* (Qxy.u.v — ASxy.w)- (20) 


If 2 = 4,(S7yQx.v.v), the MLE is obtained, which in econometrics is also 
known as MLE under limited information (LIML). For 2 = 0 the 2SLS-esti- 
mate is obtained. Finally, according to (3.2.107), for A = 1 there results the 
estimate Sy')Sxry.y which in the case V = 0 becomes the OLSE in the LIFU* 
if it is considered as a linear multivariate regression model. Because it is im- 
portant in the literature concerning linear simultaneous equations, the original 
construction of the 2SLS-estimate and LIML will be explained here. Thus the 
relations between linear simultaneous equations and LIFU* are examined 
from another aspect. For the present we consider the case V = 0, U +I, 
(cf. (3.1.37)). Then M can be estimated by the OLSE M with U as regressor 
in the first step. For 2 = I ® 2 we have 


~ 


W2 TP. VORM SMe ODM = (GU)2A@z (21) 


284 Chapter 3. Models with errors-in-variables 


(cf. Bunke and Bunke, 1986, (251515), with Ci CX XA A) ue for 
explicit LIFU we have 


AYE = [é:, Be tae 


M=(Xj%],  XEMooxn (22) 


Y = BX +3 
Dé = (UU’)1 @ (—B | I,) S{|—B'l,] =: A @ 2;: 


In the second step the 2SLS-estimate is obtained if the LIFU* (22) is inter- 
preted as regression model. In this model we have U = I,, corresponding to 
(3.1.37). With that the 2SLS-estimator is 


B= (KA2X') FAY’ = OF yOQxv.v (23) 


and for g = 1 there results the shape of the 2SLS-estimator already derived 
from (20). Now (23) is defined also for g > 1, but then it can not be brought 
into relation to other eigenvalue estimators. 

The LIML, too, was originally constructed starting from (22). For this 
a two-stage WLSE was used in (22), in which a criterion with estimated co- 
variance 2, was taken instead of the original least squares criterion belonging 
to model (22): 


te [27 (¥ — BX) AY BX)] (24) 


The corresponding estimator 2; of X; is based on one of Z. Starting from (21) 
we choose 


pes Dz_ su = Szy (25) 


as an estimator of 2. 
(Then 5/(m — n) is an unbiased estimator of 2.) 
Then, with the help of 


435 ee T,) 2[—B' + 1, (26) 
the LIML is defined as 
min tr [((—B}J,) S| —B’ t Iq) (¥ — BX) A-U¥ — BX)’. (27) 
- 
Because of MA-1M = ZPyZ' = Qz.v, this is equivalent to 
min [{tr PZ254Qz.y27""}] 
B 


with = R([SY?—B’ : I,]). The problem was already treated for 
ft = R([—B' ; I,]) = RL") 


3.3. Further estimation procedures 285 


in Section 3.23. Thus the matrix L+ is obtained from the eigenvectors belonging 
to the q smallest eigenvalues of S71,Q7.y. With the transformation described 
in Example 3.2.2, Section 3.2.5, one obtains B. According to (3.2.119) this is 
just the estimate constructed as LIML, in which f+ and B are connected due 
to (3.2.116). But take into consideration that for U = I,, V = 0, the MLS 
arises for LIFU* without replications. The MLS provides only a saddle-point 
for the likelihood function which is unbounded (cf. Section 3.2.1.3). As above, 
the case V + 0 is included in a natural way. Q7.y is replaced by Qz.y.y and 
Sz.y by Sz.w. Starting from (22), Theil (1958) constructed the k-class estimators. 
Those we obtain from (20) for fixed 2 and k = 1 — 4. For q = 1 the known 
representations from the field of econometrics result if in Sy. yw, etc., the corres- 
ponding projectors are written. For instance, the 2SLS-estimator is obtained 
from 


bos = (XP,X')-1 (XB,Y’), 
Where. Pra J — Py, — (1 — 1) (L — Py): 


Under the assumption U’V = 0, which is possible without loss of generality 
according to (3.2.81), it results Py, — Py, = Py. Hence the representation 
commonly used in econometry is obtained (cf. Farebrother, 1976). 

However, even the so defined k-class estimator can not be applied without 
problems. Their most important representatives MLE and 2SLS-estimator 
have no finite moments of higher order (Mariano and Sawa, 1972, 4). The 
same holds for all k-class estimators with fixed constant k (Sawa, 1972). Thus 
the variances of these estimators also do not exist. Consequently, their accuracy 
can not be compared on the basis of their variance, which is a simple measure 
of concentration. Of course, there are justified arguments against comparing 
only estimates with existing variance (Anderson, 1976, p. 8). 

For instance, the probability of falling into an interval around the true 
parameter might also be chosen as a very informative concentration measure. 
An enlarged theory in this direction was developed for models with a finite 
number of parameters (cf. Weiss and Wolfowitz, 1977). For LIFU* where the 
number of parameters increases, asymptotic results were obtained for the 
mentioned estimates (cf. Section 3.5.2). 


3.3.1.6 Modified maximum likelihood estimation 


Of course, one also wants to have simple estimates available with finite variance. 
For that reason Fuller (1977) investigated modifications of the MLE and 2SLS 
estimator which have moments of higher order for sufficiently great sample size. 

It is clear that the moments of the estimators defined in (20) do not exist 
if the inverse of the random matrix Qy.y.y — Sx.w becomes ‘too often too 
great’ — that is, the matrix itself becomes ‘too often too small’. This happens 


286 Chapter 3. Models with errors-in-variables 


if 2 is ‘too often too great’. Thus a perturbation of 2 towards smaller values 
could secure the existence of the moments. In fact this holds fos the following 
modifications: the MLE becomes a modified MLE, 


bum a (Qx.vv — ASx.w)? (Qxv.0.v — ASxy.w)s (28) 
A= 4,(SzwQz.0.v) — alm(m —n), «> 0. 
The modified 2SLS estimator is 
bas = S¢/Sxv (29) 
with 
Sx =Qx.0v —AS8x0,  Sxy = Qxv.u.v — ASx.w (30) 


and with g := 1 and 4, = 4,(Sx,wQy.v.v); 


(iyi = = e2 
n je ge et fo pieces 4 Bist Gea 
7 m(m — n) m(m — n) 
| A ie Matai otherwise. 
r m(m — n) 


Noth these modifications have the same bias up to order O(m-?) (Fuller, 
1977). 

Such modifications are formally defined also for g > 1. Now, for the MLE 
of B it has to be taken into consideration that their explicit representation in 
the form (20) is possible only for g = 1 (cf. Example 3.2.2 in Section 3.2.5). 
The substance of the modification in (28) consists in the perturbation of Sz w 
towards smaller values, whereas the modification of the 2SLS estimator arose 
directly from (20). The comparison of both the modifications is facilitated 
because of their asymptotic equivalent bias. With that, for g > 1, two diffe- 
rent possibilities of generalization are obtained. 


A modified MLE for #* results as the eigenspace to the q smallest eigenvalues 
of 


S7w(Qz.u.v + AS, Wwe 


where 4 is an arbitrary positive constant. On the other hand, a modified k-class 
estimator can be obtained for £ = A((I, B]) from (30) if 4 is an arbitrary fixed 
or random number 2 = Ke m))- From (30), for g=1 the MLE result for 
1 = A,(SzwQz.v.v), whereas this is not the case for gq > 1. With the help of 
another approach it will be shown in Section 3.4 that for g > 1, too, a natural 
connection can be established between MLE, 2SLS, and other estimators. 


| 


3.3. Further estimation procedures 287 


eee 


3.3.2 Linear functional relations with dependent errors 


Dependent errors most frequently occur in econometric time-series models, 
and less with problems of so-called data-fitting in natural and industrial sciences. 
But the treatment of such dynamic models is beyond the scope of this intro- 
duction. On the other hand, with ‘pure’ data-fitting problems there will be 
situations from time to time where dependences among the observations can 
not be excluded. Such dependences may arise from the experimental design 
as well as from the observations: These models must not include such ‘strict? 
and explicit dependences as in time-series models from the first. For that 
reason it is desirable to have a praticable consistent estimator available for 
such problems in which less is known about the mechanism of the rise of ex- 
perimental design and of the process of observation. Of course, this method 
could also be applied on time-series models. Such an estimator was constructed 
by Robinson (1977). In particular, it includes regression models with independent 
errors. For the present a heuristic introduction of this estimator is given. 

We start from a LIFU~ where the stochastic quantities §;, dj, €; are all 
together independent and identically distributed in each case and have vanish- 
ing expectation. If one constructs the estimator as usual on the basis of the 
second sample moments — that is from Sz — then, in the limit as n — oo one 
has the covariance matrix of Z available to construct the consistent estimator 
(ef. also Section 3.2.1): 


pp. (DOF PO PBB 
BD) De + BD(§) B') 


(31) 


(For fixed experimental design one must demand the existence of the limit 
Sz > D = D; < o. This demand is obvious since otherwise the experimental 
design would be too dispersed. In that case no consistent estimator would be 
possible on the basis of exclusively using the second sample moments Sz since 
the convergence of Sz would be necessary.) 

But now, with the knowledge of the p(p + 1)/2 elements of DZ, in the limit 
it would not be possible to determine the elements of B, D§, Dd, Dé since the 
total number of the parameters g(p — g) + (p — 1) (p —¢4+ 1) + 4(¢ 4+ 1)/2. 
exceeds this number of known second sample moments by (p — ¢) (p — q + 1)/2 
This indeterminacy was already reflected by the identifiability statements of 
Theorem 3.1.1. Therefore in the model one has to assume additional informa- 
tion just corresponding to the amount of information that is wanting. 

For this, different possibilities have been discussed already: the utilization 
of information from higher-order moments, instrumental variables, replicated 
observations, ect. These methods start from a complete indeterminacy of all 
parameters. But there are cases in which sufficiently many parameters are 
exactly known; for example, the model with known variance quotients in 


288 Chapter 3. Models with errors-in-variables 


LIFU- is a classical one. In many practical problems it can be assumed that 
Dé and Dé are diagonal. This is true for many technical curve-fitting problems 
where the measurement errors of the single variables are independent. Or, 
perhaps it can be assumed that the kth component of 7 is not influenced by the 
Ith of &, therefore that b,, = 0 for some J. Finally, it could be known that 
several variables are measured with the same instrument, in which case the 
variance of the measurement errors would be taken as constant and the cor- 
responding variances Dd“) would be equal. 

Now one has to check what this additional information implies for identi- 
fiability. For the present, from 2’, due to (31) under fixed Dd one always can 
determine D§. Here an equation is D§ = Da — Dd. Further conditions might 
be given by an implicit equation 


0 = hy), py = [B, Dd, De] (32) 
(cf. Section 3.5.3) where in 
fi Ra (pg) g == Gp = 9)*, 8h (33) 


the trivial conditions of symmetry on 2; and 2, are to be included. Above that, 
at least (p — ¢) (p — ¢ + 1)/2 additional conditions are necessary. (But recall 
that the demand Dd, De > 0 implies the known inequalities for certain sub- 
determinants.) | 

In h and Sz, more information can be contained as necessary for the con- 
sistent estimate of parameters. Such an estimation problem can be solved by 
data-fitting and therefore by solving a corresponding maximization problem. 
For this, functions which are similar likelihood functions offer themselves for 
several reasons. One starts from a LIFU 


y; = Bu, + (e; — Boj) (34) 


formally written as a regression model. Inconsistency of OLSE (cf. Section 
3.1.2) was in essence based on the fact that the regressor a; and the error 
€, — Bd; are not independent. If one could succeed in generating a regression 
model by parameter transformation, in which the regressors and the errors are 
uncorrelated at least asymptotically, one would expect that the corresponding 
MLE from the regression model becomes consistent. For this, put B = B — B. 
Then 


yi = But 6, é; = &; — Bo; + Bx;. (35) 
Therefore, with the abbreviations Sy; := Sz) 4, ete:, we have 

Sx = St433~-35 + BSx,ax 

= Sz, + 83, — SB’ — SB’ + Seber 


(36) 


3.3. Further estimation procedures 289 
AE SN IS EE Re SEARLE BAO SBI SEEDER UE PO A hse 


In this equation the first three terms vanish asymptotically. If we put B’ 
= S;'D(d) B’, then this also holds for the last difference, and the desired form 
of a regression model results. Hence we form the regression model 


Yi =i Ba, a5 &;, 
B=B—B, B= BDds;, (37) 


Because of the independence of the §;, dj, &;, we have 
Dé; = De + (B — B) Dd(B — BY + BDEB'’ 
= De + BDdB' — 2BDdSy'DdB' 


+ BDIOS;'2,S;'DdB' . (38) 
If Dx is approximated by Sz, then, for D(é;) one obtains approximately 
Z = De + B(Dd — D(d) SyD0) B’. (39) 
We put 
y = (B, Dé, De). (40) 


One obtains as quasi-likelihood function for the regression model (37), 
L(y) = —log det [2] — tr S-18;,. (41) 


Robinson (1977) showed that —1, is a contrast function with parameters B, . 
From minimum contrast estimates we know their consistency (e.g. Strasser, 
1973) and asymptotic normality under additional assumptions. The essence of 
the procedure developed by Robinson (1977) consists in showing the conver- 
gence of S; toward the covariance given in (31) under weaker conditions than 
that of simple independence of the §;, d;, €; and then in utilizing the properties 
of the function 7/,(y). 

Now we give the construction of the estimator in a comprehensive form. 
The proof of consistency, some questions connected with asymptotic normality, 
and further explanations will follow in Section 3.5. 


By %, we denote the sequence of random variables dj, &;, Si, k <n, and by 
B5,, B,., those sequences which in addition contain (&,, § 1), (bn» §n) respectively. 
The model assumptions used until now will be weakened for explicit LIFU: 


ni = Bos, a= & + 6;, Yi = Ni + Fi (42) 
| Assumption Al Let, for n > 2 almost surely, 
E(O,|Bs,)=9, Elen | B.,) = 9; (43) 


19 Nonlinear Regression 


290 Chapter 3. Models with errors-in-variables 


E(0,0;, | Bs,) = 23 < 0, (44) 
EEE, | B,,,) aa 224 <0, as (45) 


where the T,, may be nonrandom or random (dependent) matrices with the following 
convergence property : 


YS Tn Ty <0 (47) 
i=1 
(Remark: T, must no be regular!) 


Assumption A2 For some c> 2 there is a constant K < co such that for all 
n= 1: 
Ells, Bild, Eile ll’ < K. (48) 


Assumption A3 Let 2, := 2;,+ 7) be regular. (We note in particular that 
the case 0 = 2, = Dé is possible, which gives the usual linear regression model 
with correlated errors.) We put 


y := (B, 2;, Z,) € R° (49) 


for BE Magy (pqs 25 © Mip—g) x(p-q): Le € Mgyq: Let h be a (sufficiently often) 
- continuously differentiable function in yp: 


Wiery tere Cee Peers (50) 
For fixed T we put X, := 2; + T 

Ag Rises a Aa (51) 

Q = Ay) + BX; — X;2 7125) B’ + Q, Qy = Q(yw). (52) 
Assumption A4 Let w be a compact set of parameters wy € IR* for which 

A(y) = 0, (53) 

Ay) > 0 (54) 


holds and let the ‘true’ parameter po = (Boos bes 5.) be contained in y. 
Furthermore, let 


S(y) = Sy_zx = Sy — ASee i SrA ala AS,A’, (55) 
A = Aly) = BUI — 3;S¥), (56) 
O = Qy) = BS; — Z,SP2;) B’ + £, (57) 


3.3. Further estimation procedures 291 


and Ki 
L(y) = —log det [Q] — tr Q-*8(y). (58) 


Robinson (1974) defined that value > in y as the estimator of the parameter 
which maximizes 1,(y). 


3.3.3 Nonlinear models with independent errors 
3.3.3.1 Modified least squares estimation 


As with linear models, there are also some possibilities for nonlinear models to 
modify the available estimators. Thereby one has to consider that the complete 
solution of the nonlinear equation systems developed in Section 3.2 computing 
MLE and WLSE may be very difficult since the dimension increases pro- 

~portionally to the number of observations. That is why one needs algorithms 
which are more easy to compute for such sample size which is to be considered 
small in practical problems. Such alternatives are based on linearizations of 
the starting problem and utilization of the first iterations for the WLSE start- 
ing from an initial estimate. General iteration procedures will be trated in 
more detail in Section 3.8.3. In the present section a modified Gauss-Newton 
procedure is discussed in detail. Thereby, starting from a consistent initial 
estimate 2, for a one obtains asymptotic normality also under increasing 
experimental design (cf. Section 3.5.4) with the first iteration z,. This modifi- 
cation of the Gauss-Newton procedure was developed by Fuller and Wolter 
(1982) for d, = 1 and is based on the paper of Villegas (1969) in which replicated 
observations of a fixed experimental design are treated. We consider the 
explicit sequence model for m = 1, 2,..., 


qi0o = ri(Eio; To) » p= 1; see) Um> (59) 
Lim = Fig an Simos Do im = Pp (60) 
Yim = Nio + €imo- (61) 


We putd, = q, d, = p. The size m, of the experimental design increases with m: 
MN = Nala (62) 


where n,, = n(m), lm = Um). 
Assume that is known. Let 


has be) > OF Lt = kp, = 0(m-12), (63) 
Then one obtains the relation: 
n= m-k =o(mi?), (64) 


19% 


N 


292 Chapter 3. Models with errors-in-variables 


Then, because of Z,, + 0 it also holds that zim — u;. This model contains the 
case of independent replications of a fixed experimental design with known 
covariance D$ = Z. Here n is fixed and for 1 = 1,...,n; mj = ky! = Imp, it 
holds that 

km = nlm = O(m-) = o(m-¥?2) (65) 
and 

Zim = i» Doin aa Zl km = mins 
An initial estimator 2, of a» with 

m, — m= Op(n-¥?) (66) 


is the assumption of the procedure. Under weak assumptions such an estimate 
is the OLSE in the nonlinear regression model 


Yin = TA Lem» It) + Stn 
cf. Section 3.5), where the x;, are used instead of the &j9. In the case of re- 
plications one would have Yim = Yi. Lim = %j,- 
Starting from 7, one constructs an iteration 2, similar to the Gauss-Newton 
one. For this iteration asymptotic normality can be proved. 

With 2, an estimator ,), of the experimental design p;,), which lies on 

Sy,z 18 obtained by 

Le(M(n)1) = |l2(n) — Mnyrllo-2 = min |[%n) — Mmllo-a- (67) 

HE wry 


The estimator ,), is one of the stationary points of 1, which, because of 
Q=1©®& 2, result from the equations 


0 = Gl, = Ord (yi — ra) + 2°(yi — ri) 
iP Ord” (x; — &) + 2%"; — &,) (68) 
where 2,,! was partitioned into the blocks 2° etc., and 0,74, := O¢7;(&i,, ™). 


From the solutions of (68) one chooses those ones which minimize (67). With 
the Taylor-series expansion 


i= Ta + Orin(% — mm) + Orn(E; — fn) + BR (69) 
we obtain the following approximation of [,(u) with C(n); = 2n) — Mnyjs 


An; = x —7;,j = 0, 1 Cer AR pe 


L(, 0) = lle(ny — (ayn — (4 — Hnyi)llo— 

= UAE ays, Am). (70) 
Therefore the WLSE &,% is to be obtained approximately by the following 
relation: j 

1,(%, E(m)) = min 1,(2, E(m) © U(Aym, 49é) := min 1(Am, Agyi)- (71) 


%,P(n) Am, 48 ()1 


3.3. Further estimation procedures ? 293 


Then the second iteration 2, =: gy (GN for Gauss-Newton) for the WLSE # 
results from 


Ne = 1 + Aan. (72) 


The point ¢ny2 := ([Ei2. Ni2])i=1,...n = ([E:e, ri(Fie, 72)])i-1,....n is contained on 
Sy,x,- But with repeated application of (67) a ‘better’ estimate of the experi- 
mental design is obtained. The solution of (71) is obtained as in the linear 
regression model. With 0 = O,y_,),s,for fixed Am, the minimum over A€;, is 
the solution of 


pare Os Orin] A2é(47%) = Py 2 V2Cy, = [0} Oni] Am) (73) 
with 

£5 = KEV [L pg, Ori). (74) 
From the last equation it follows that 


Lf = AZ| —Ofry | Iy)). (75) 


Therefore 4,7 results from the extremal problem 


min ¥||Prx(Z"(Cq — [0 | Ara] Amy)lB,. (76) 


An, t=1 


This is a least squares minimization as known from linear regression theory. 
The solution A,z results from 


n 
Aan = gn — % = 2a DY Own 2 (Eu — Oe7nSin) (77) 
i=1 
with 
D2 — DS, (hry dq ra), (78) 
i=1 
Sa = F (om, Fa) = (—Ogrin tq) Zl —Opran | iy). (79) 


As one can show we obtain from (70) 
Sian — Sn = Zi"(L | Geran) Zizi — (Ea fri + Saran» Arm). (80) 


In these normal equation only the (d, x d,)-matrix », and the (q X q)-matrix 
S, are to be inverted. The computational effort corresponds to corresponding 
iterations with the solution of the WLSE equations as described in Section 
3.2.6. In the iteration procedure proposed here one has an estimate pw € Syx, 
of the experimental design at every stage. That must not be the case for the 
corresponding procedures which solve the normal equation of the WLSE for 
general implicit errors-in-variables models (EVM) (cf. Section 3.8.3). 


294 Chapter 3. Models with errors-in-variables 


3.3.3.2 Different covariances 


Finally we consider the model which results from the possibly unequal numbers 
of replications at the single experimental points. Then, in the general model 
(60) one has to put 


Doin af ain Se +3 ™;. (81) 


And 
Mo = mn 


is the average replication number in the experimental points. Now, all 
formulas remain unchanged if in the corresponding terms 2, is replaced by 
Xivm). In the corresponding model with replications and possibly different 
covariances we have 


Nm 
Diem) = XilMi, m= >i m;. (82) 
iat 


To obtain the desired asymptotic properties one has to demand that (63) and 
(64) hold for any point of the experimental design (cf. Section 3.5.5.): 


lim Xi(myMi = dj = 0, (83) 
m1 = 0(m- 2), tm = o(m2), (84) 


In contrast to (64) the second part of (84) is not a conclusion of the first part 
but an independent assumption. 


3.3.3.3 Unknown different covariances 


We show how this procedure is applicable and how one can obtain independent 
covariance estimates from the sample means of observations, in the case that 
replicated observations are available. In the model with 2 = Diag (I,,, ® 2%) 


Lim =Si(m,—1),° Sp= Da ty — Bi) (ey ee) (85) 
g=1 
are independent and unbiased estimators of the covariances 2; of the zim = Zi, 
The corresponding holds for 2 = I, @ X' witn 


£n=S/lm—n),  S=>S, (86) 


It is well known that these estimators converge to 2X; for m;—> oo, and to X 


for mM — %m => 00, respectively. 


3.3. Further estimation procedures 295 


As one can see, the estimators f, which are obtained by the minimization of 
L= Zn). by Hla | 
On Diag (1, © Sian), , and Q= 1, os. 


(87) 


converge toward the same values as the estimators computed for fixed aim): 


3.3.4 Estimation with instrumental variables 
in linear functional relations 


3.3.4.1 Introduction 


In this section the estimation using instrumental variables, which has been 
treated for bivariate linear functional relations in Section 3.1.1, is to be in- 
vestigated for a general multivariate model. We start from fixing a general 
model of a linear functional relationship with nonrandom unobservable 
variables (LIFU*). For this we first prove some statements on parametrization, 
in connection with models considered previously (in Section 3.1.3.3) and on the 
form of the maximum likelihood estimator (under normal distribution). Re- 
sults and notation of this section are fundamental for the asymptotic treatment 
of the model to be presented later on (in Section 3.4). 

Then, in such a general model we consider estimation using instrumental 
variables (IV). Hereby the connection between the LIFU*t model under con- 
sideration and a corresponding model of a linear functional relationship with 
random unobservable variables (LIFU~) has been found to be essential. In 
such a model we at first represent the [V-estimator as a maximum likelihood 
estimator in a LIFU~ model (under normal distribution) enlarged by inclusion 
of some random IV. Then this result serves as a heuristic basis for the con- 
struction of the [V-estimator in the LIFU* model (for nonrandom unobservable 
and instrumental variables). A special 1V-estimator in the model considered 
will be defined as a maximum likelihood estimator (under normal distribution) 
for a certain (possibly inadequate) model setup. This definition is justified by 
the preceding consideration; the MLE results as a special case of the general 
IV-estimator so determined. The asymptotic properties of the so obtained 
canonical instrumental variable estimator (CIVE) later form the subject of 
Section 3.4. 

The models investigated in the sequel are always linear functional rela- 
tionships with random or nonrandom unobservable variables according to 
Definitions 3.1.6 and 3.1.5, respectively (including the implicit case). We 
always assume that the error variables S;,7—1,...,m are independent and 
identically distributed, and that.in the case of LIFU~ the same holds for the 
quantities u;, 7 = 1,...,m which are independent of $;, 7 = 1,..., m. In the 
occurring random variables the observation index will sometimes be suppressed. 


296 Chapter 3. Models with errors-in-variables 


For the properties of the set &,, of J-dimensional linear subspaces of R* 
confer [A 3.16]. For @; € &,, 7 = 1, 2, let £, + £2 denote the linear hull of 
L£,uU £2; £;.\ £, the orthogonal complement in £, of £, 0 £,; £4 the ortho- 
gonal complement in R* of £,. For measures P and linear mappings A let AP 
denote the image of P under A. 


3.3.4.2 A general model for linear functional relations with nonrandom 
unobservable variables 


Let 7,¢,7,mEN,G@SrSp,q<p,m=p—gq and alinearspaceJ,J € xX, 
be given. Furthermore, let #* be a set of probability distributions P over 
[IR?, 8?] with 


de — Onn eh far cts, 
and let 


M = {i € Meosem | R(M) € opie 


We consider a distribution model (LIFU*) for an observable random p K m 
matrix Z, described by the following relations: 


Z=M+S 
Mem 
c = (6i)s=1,..).moS ©) PS, 


Thus the observable matrix Z decomposes into the matrix § (with indepen- 
dently and identically distributed columns) and the parameter matrix M on 
which we have the information 2(M) € &, p-¢ 

We consider the problem of estimating the derived parameter 


:= R(M). 


The included distribution model # is as yet unspecified (except by the above 
assumptions). In the following we consider submodels of (LIFU*) and specifi- 
cations induced by further restrictions on M and certain assumptions on ?*. 
These models will be indicated by aggregation of corresponding symbols like 
(LIFU)*, (R), (N). 

In Section 3.1.3.2 (for the case r = p, i.e. of a regular DS) the interpretation 
of the model assumption R(M) € &,,,-, as a linear functional relationship was 
given, correspondingly that of the model (LIFU*) as a model with errors in 
variables. If 2(Df) is assumed known and different from IR? then (disregarding 
a set of measure zero in the sample space) the function Pzi1M of M becomes 
known also, in particular certain components of w;, 7 = 1, ...,m may become 
known (if M = (i)ia1,...m): 


3.3. Further estimation procedures 297 
For the case of a general known J = (DS) considered here, we now introduce 
an additional assumption (R) to be immediately justified by an example: 
(R) Seis € Log | 4. Jd = R?). 


Example 3.3.1 (Inhomogeneous linear functional relationship) Let r,q,m € IN, 
r>q,m =r anda set P* of probability distributions P over [IR", 8*] with 


ear 0;, KR f aan’ dP) = Rt 
be given. Let 
M™* = {MEM , | SH ER! A(M — E14) € Q,,_,}. 


We consider a distribution model for an observable random (r X m) matrix Z, 
described by the following relations: 


Z, = M, + &* 


oF = CFint.ms S*OQP™ 


(for independent ¢7, 7 = 1,...,m). We consider the problem of estimating the 
parameter 


(£*, E) := (RM — El/,), B). 


If we put M, = (i);-1,..m> then the m;, 7 = 1,...,m lie on an unknown 
affine manifold and all components of 2; are only observable subject to error 
(since 7[D$*] = r). This model can be construed as a special case of (LIFU*), 
(R), if we pat p= 7 or 1,Z= (lj, | Ze], M = [Ui Mo), c = [01 xm $*], 


J = R([0) x tZ,]) 


and consider the resulting distribution model for Z. Namely, for L** € M,,q; 
R(L*1) = £* it holds that 


LPB TM == 0, a 
r[M] = dim &(M) = dim (—#} J,) R(M) + dim (1 Oy xr) R(M) 
= dim £* + dim AY) =r—qtl=p—gq. 
As r|[—E’ ve) Tie = q, we have 
£ = AM) = A —EF' ;1,] L*+)' = ((-#’ ABREU eee 


According to part (a) of [A 1.6], £ varies in Kt if (¥*, #) varies in ¥,,,-¢ XR’. 
With this the question of identifiability of the parameter (£*, H) is answered, 


_ too. Obviously £ is an identifiable parameter function (cf. Remark 3.3.5 be- 


298 Chapter 3. Models with errors-in-variables 


low), and using part (a) of [A 1.6] one easily shows that the equality w.r.t. 
f = ({[—EH I,| £**)+ is a maximal equivalence relation in the sense of Bunke 
and Bunke, 1986, definition 1.5.1). Consequently we can confine ourselves to 
estimating the parameter /. 


Obviously, for the case of regular DE (i.e. r = p), the general condition (R) 
represents no restriction. If (R) is violated, then 7[Z] < p holds almost surely 
for any observation size m and by transforming into 2(Z) a dimensionality 
reduction of the data becomes possible. 


Remark 3.3.1 It can be shown that (R) eliminates a set of parameters £ 
which is closed in %,,,-¢ and of measure zero (in the sense of the invariant 
measure in 2, ».-¢ (cf. [A 3.16]) 


give 


In this section we mainly consider the case of normally distributed errors: 
(N) PS IN, (0,, 2) |Z € MZ, AZ) = I}. 


We consider a further condition [Ex] which in the following leads to explicit 
models. Let a linear space Jo, Jo € &y,q with the property Jog S J be given. 
The condition is 


(Exes Ip IR? 


(Ex) thus implies (R); in the case r = q, (Ex) and (R) coincide. 
We assume that 


F = RO (p-1) xr I,}) 
(if 7 < p) and 
Iq = R([Op-a) xa I;}) 


Remark 3.3.2 In the model (LIFU*), (R), (N) or (LIFU*), (Ex), (N), this 
does not imply a restriction of generality. Indeed there is always an orthogonal 
transformation O € It,,,, such that OJ, OJ) have the above form; with 
transformed observation OZ and parameters OF one obtains a model in which 
the above assumption is satisfied. 


If (Ex) is assumed, then we speak of an explicit model. Indeed, according to 
part (b) of [A 1.6], (Ex) with the above form of Jy is equivalent to f+ 
= RAi([—B’  I,)) for a BE Myy(p-q), hence for M = (uj);-1,. m it holds that 


(—BHI,) us =, . ¢=1,...,m 
or 
Hn; = Bé,, — ale vey Mt (88) 


3.3. Further estimation procedures 299 
ea ee ee ee ee Po ee eee 
for wu; = [&;| ni], 1; € R24, 7=1,...,m. The linear functional relationship 
fi € £ between the components of mu; can thus be written in the explicit form 
(88). 

A further model assumption considered in the following is 


(Aw) R(M") S wW € ae 


for a given n-dimensional linear subspace @ of R™, p —q¢ <n <™m. By this 
it becomes possible to take into account certain restrictions on M, i.e. the 
case where a model (LIFU*) is combined with a linear regression model. For 
this we write (A) if @ is given and fixed. 


Remark 3.3.3 For the model (LIFU*), (R), (A) being nonempty, it is ob- 


viously necessary that n = p —q. This condition is also sufficient: with 
LE Myx (pqs RL) = £, W EM) xm AW’) S W it holds that 


See 
IWeM, RW'L')/ EW. 
That is why we always assume n = p — q in what follows. 


Let us now introduce some further notation. 


(i) For A € M,,.,, k,l € NN, let 
D4 rt, 1 -=[—4'' i], o(A) := R(Ly). 
Obviously L4 Ly = 0,,.,and e(A) = (A(L4))+ are valid. The mapping 9 : U Myx, 
k,leN 


> U &,4,) is injective. For further description see part (a) of [A 3.16]. 
k,leNW 


(ii) According to part (b) of [A 1.6], (Ex) is equivalent to £ € e(Myx(p-q)); 
then let_ 


Bo 


This parametrization of £ is used in the explicit case. Furthermore, let 
the parameters B;, i = 1, 2 be defined by 


B= (B, 1 Ba), B, € Das pen By € Wax ira) 


(in the case g <1 < p;forr =qorr = p let B, := Bor By i= B respec- 
tively). 
(iii) In the case r < p let 


J = E+ 


Or x (p-r) ? 


ees 
J var, 1 Dae 


and for r = p let 
Bh owe 


300 Chapter 3. Models with errors-in-variables 


furthermore, let 


topline Os 


ax(p—g) * 


Dg 2= Longo? 
Therefore for J, Jy from (R), (Ex) it holds that 

JI=KRJ), Jy = RLh). 
Then, for BE Max (p-q) 

L3=1,+ LB, DS a Dik ad Oo (89) 


Let us now introduce distributional specifications (V,), » = 1, 2, 3 under which 
the model is to be investigated. Let Py be a set of probability distributions P 
over [IR?, $?] with 


JEP Se ine tl ad Pe Oye a | ek dita 
(iv) Fora LE M=, R(L) = J, let 


Ve = {2 € ME | X, = 07D, a? > O} 
Vg := {Z, € ME | R(Z;) = J}. 
Let the specifications (V,), » = 1, 2, 3 be given by 
(V Pre | ca), Pea Pe), y= 1,253. 
Let us introduce some further notation: 
(v) Me JM | Tf eit LY Lp ot*=—JS; 
in the case r < p let 
MJ 
(vi) For § from the determination of (LIFU*), X from (iv), let 
pS, ap id Lied. 
of = tr [2;]/tr [2}. 


Remark 3.3.4 In the model (LIFU*), in the case r < it almost surely holds 
that J1’Z = M, because of 


EJVZ = M,, DIVL = (In @ I") (Im ® Z) Im @ J*) 
ae Omm(p—r) x m(p-r) » 


i.e. the submatrix M, of M is observable. 


3.3. Further estimation procedures 301 


Henceforth, if it is convenient, we will identify the model (LIFU*) with the 
resulting distribution model for Z, given M,. 


Remark 3.3.5 In the model (LIFU*+) M = EZ and # = R(AZ) are identi- 
fiable (for a distribution model {P,, 4 € O} a parameter function y:0 >T is 
called identifiable in the model if 


VO, 8 € O: Py = Py > y(8) = (8) (90) 


ef. Bunke and Bunke, 1986, definition 1.5.1). Hence this also holds in (LIFU* )5 
(R), and in the other submodels and specifications considered here. 


Remark 3.3.6 In the case r= q¢ the model (LIFU*), (R) is explicit, and in 
(88) the €;, 7 = 1, ...,m are observable (M, = (€;);_, ) according to Re- 
mark 3.3.4. Thus a linear regression model results. In the formal derivations 
we restrict ourselves to the case q <r < p as far as nothing different is said; 
all results carry over, with obvious simplifications, to the cases g = r and 


q = p, and sometimes they are used in this form. 


In the model (LIFU*), (R), (A), (N), (V,) one can consider the problem of 
maximum likelihood estimation of the parameters for vy = 1, 2, 3; the distri- 
bution of Z, has a Lebesgue density. In the following this is worked out in 
more detail. It turns out that for sufficiently large m (condition (B,)) there 
almost surely exists an MLE ?°) of # based on the continuous normal density, 
and also such one of the parameters o7, X;. For this the connection with the 
models considered in Section 3.2 is specified in the following. The results of 
Sections 3.2.3—3.2.5 will be used. 

In accordance with Section 3.2.2, one then obtains the weighted least squares 
estimator (WLSE) by suppressing assumption (N). Further generalization to 
the instrumental variable estimator is now achieved by cancelling requirement 
(A) in the distribution model for Z, but obtaining the estimator as WLSE from a 
formal model setup (LIFU)*, (R), (A), (V,). This setup is thus possibly inade- 
quate. The space @ used in restriction (A) represents the ‘instrumental va- 
riables’. 

An intuitive basis of this estimation procedure can be found in the connec- 
tion of the model (LIFUt) with a model with random unobservable variables 
(LIFU-). Roughly speaking, (A), i.e. M’ = W'K’ (W’E Mnxn, RW’) = W) 
can be replaced by M’ = W’K’ + ¢ provided that ¢ is asymptotically negligible 
in a certain sense. This corresponds to a condition of a nonvanishing correlation 
between unobservable variables and observable instrumental variables re- 
presented by M or @. A further paragraph will be devoted to justifying in detail 
this procedure by the connection to model (LIFU-). 

The estimator #° thus obtained in a model (LIFU*), (R), (V,) will be 
referred to as canonical instrumental variable estimator (CIVE). From the 
alternatives to the MLE (cf. Section 3.3.1) existing in the model (LIFU*), 
(R), (A), (N), (V,) result alternative instrumental variable estimators in an 


302 Chapter 3. Models with errors-in-variables 


analogous way. In this framework an asymptotic comparison is then carried 


out, in the explicit case, in Section 3.4. The comparison of the MLE and alter- 
natives results as a special case. 


Before proceeding to a detailed development let us give some more basic 
notation. 


(vii) For A € Meter) let 
area dray 


F, isalwaysa regular lower triangular matrix; for A; € M,.(p_y,% = 1,2, 
it holds that 


sr lig x de Bee Fy a Y Sarg 
OY AS dir ceed ens Ue lead Be Uae ae 


(viii) For linear spaces Y,€ 2s, %= 1, 2 and matrices X €Mixm, Sir t, 
my € IN, + = 1,2, let 


Sgii= XX’ 
Qr.y, = XPyX', Sx.y, = XP y.X' 
Oxyry, = XPyyyX: 
Here, on the left-hand side of the last three terms, J/; is allowed to be 
replaced by matrices Y; if R(Y4) = Y;. 
(ix) For 2 from (iv) let 


SO 3S 25) ya 1, 2"SO= (m —n) Sz~ 


Oo” := mQz.% —nm 8”, »=1,3 
OP := mQz.% — nm 2, 
E = Z,M; 


A 


BO (Pet Sry Ls (eel a rape 
(x) Let 
COST SOs; 
O35” := mQz,.0-u, — nm 3S*, y= 1,2,3. 
(xi) For A ¢ Mt! (for the definition see [A 1.2]) let 1,.(A) denote the (uni- 


quely determined) eigenspace corresponding to the J greatest eigenvalues 
of A 


3.3. Further estimation procedures 303 


(xii) For a random variable £ with values in & let 
TOL) = LAE SDY LY (Lt € Mpg, RLY) = £4, 
Vi== el eo. 


According to [A 3.14], R(S®) = J holds almost surely, so that accord- 
ing to [A 1.3], almost surely 


L!S™L! ¢ Ke 
(xiii) Let assumptions (B,), v = 1, 2, 3 be defined by 
(B,) n= |_ 


(Bs) m=n+r. 


a 


In the following the models under (V,), » = 1, 2, 3 will mostly be treated jointly, 
often suppressing the index y in the notation of (ix), (x), and others. 


3.3.4.3 Maximum likelihood estimation in linear functional relations 
with nonrandom unobservable variables 


Let us consider the models (LIFU*), (BR), (A), (N), (V,), » = 1, 2, 3, under the 
assumption gq < r < p. At first we indicate a representation of the model which 
establishes the connection to model (3.1.38). For this recall Remark 3.3.4. 
Let M, € Mip—r)xm be given. Because of (A) we have R(M;) SW. 


(xiv) For £ € % let 
My = (My € My xm | RLM + Me)) = £, RM, | MY) SY}. 


For L¥t ¢€ M2 of CMn xg UC Utnxmii =v (p —*); Vien 
let 
Mint z.uv i {iM < lave | Im, c Ale uM, € Mx (pany) 
Ms, — M,.U+ i.V, L*' i, = 05 xn, L*’' MM, = £. 
Lemma 3.3.1. The model (LIFU*), (R), (A) as a distribution model for Z, 
can be described by the following relations: 
Z, ca M, + (e: 
M,€ U U Mis z.00n (91) 


L*+eMs, LEM xa 


eee) ee COS SAP ER Pe 


304 Chapter 3. Models with errors-in-variables 
(for independent Oy, i = 1,..., m), af 

OEM ms M=n—(p—7), AU')=WN AM). 
Proof. Note that the model (LIFU*), (R), (A) can be described by 


Z, = M,+* 
ais : (92) 
M,€U My 

LER 


with $* as above. Let £ = Fz(J+ + J£*) be a representation of £ according 
to part (a) of [A 1.6] for 2 € M,y(p-», £* € &,-4. Now, in view of [A 1.7] 
we find that MN) = V@azyy if AL“) = f+, Lf = —2I™, V = Mi;,, 
R(U') = W \ AK(M}) is satisfied. Hence (92) can be written as 


2 
M, € WU) Wy) Mies, er L*t,U,My> 


De eM EeM,x(p-r) 
and with {—E’L**| Be Mx (p—n} = Mip-r)xq the assertion follows. IM 


This representation of the model now leads to a model (3.1.38). Thus under 
specifications (N), (V,) we can exploit the results of Sections 3.2.3—3.2.5 on 
maximum likelihood estimators (MLE) in models of this type. From the MLE 
of the parameter M, one then derives the MLE of the parameters L*!, £ and 
from this, using the relation of My and Mi«1,z.y,4, according to [A 1.7] one 
obtains the MLE of the parameter £ (cf. Section 3.2.2). 


In the following lemma we give a relationship of the statistics introduced 
under (viii), (ix) and (x). There we refer to the model (LIFU*), (R); in addition 
let MW € Quin be given with n => p —q, RM) SW. 


Lemma 3.3.2 In the model (LIF U~*), (R) it holds that 
Qz.y = Fe(I*Sy, J" + JQz,.w.n,J') F's 
Oo = Fa(Jim Sy J" + IQ") Fs 

if RM{) SW € Qnny rn =p — q is fulfilled. 


Proof. Apply [A 1.5] for A =Qz.y, Au =Sy,, Ao = 2M}, Av = Qz,.y- 
Here A(A) + J = RP? is satisfied because of 7[A,,] = 7[J!’M]) =p —r 
({A 1.3])., The decomposition for Qp results from 


RQ — mQz.w) S I 
and F;J=—J. 


Theorem 3.3.1 In a model (LIFU*), (R), (A), (N), (V,) let the assumption 
(B,) be satisfied. Then it holds that 


3.3. Further estimation procedures 305 


(a) An MLE # of £ based on the continuous density for Z, exists almost surely 
and is a.s. uniquely determined. It is given by 


2 = Fa(d* + JL*), B= 7Z,M+ (93) 
and the almost surely valid relation ; 

£* = S¥%y,,_ (S*-HQES* ah), (94) 
Here S* € M7 and S*-12 QFS* 1? € M"—1 almost surely hold. 
(b) 2 = Qok Boe (95) 


is valid for 
Ey = (P5. + [S*}"”) F5'- 


(c) The relations (93), (94) and (95) remain valid if OF (correspondingly Op 
according to Lemma 3.3.2) is replaced there by QF + AS* where 2 is an 
arbitrary real random variable. 

(d) Under (V,), v = 1, 2 there almost surely exist MLE 6? and & of o; and &,, 
respectively, based on the continuous density for Z,; these are a.s. uniquely 
determined. They are given by 


6? = (mr) tr [S*Sz.y + WL) Qz.¥) (96) 
S = mm — n)S + SIL) (OM (L) S + nm-18) (97) 
(IT (2) from (xii)). 


Proof. (a) We consider the representation of the model (LIFU*), (R), (A) 
according to Lemma 3.3.1 in connection with the distributional assumption 
implied by (N), (V,). This just represents a model of the form (3.1.38) for which 
the problem of maximum likelihood estimation was treated in Sections 3.2.3 
to 3.2.5. There some different notation was used. There is the following re- 
lationship between the notation in the model according to Lemma 3.3.1 and 
(v), (vi), and (x) on the one hand, and that used in (3.1.38) and in Sections 
3.2.3 —3.2.5 on the other hand: 


(3.1.38), ‘ = 
Section 3.2 p ng ZM,M, 2X T+ mm=n) 18 mQ—n(m—n)I8 


Section 3.3.4 r p—r Z, M, M, SF L*+ S* Qo”? 


The other notations used in the following coincide. From Section 3.2.4, 3.2.5, 
one can see that the conditions (B,) ensure boundedness of the likelihood 
function and existence of the MLE of the parameters. Let the resulting MLE 


of the parameters L*+, M, be denoted by L*!, Mg. 


20 Nonlinear Regression 


+ 


306 Chapter 3. Models with errors-in-variables 


Under (V3), from (3.2.67), (3.2.68), and (3.2.70), one obtains 
R(L*+)+ ae S*2y pat mg yS*'e) ; 


From Sections 3.2.3, 3.2.4, one can see that this formula also holds under (V,), 
y = 1, 2 (with the determination of S* according to (x)). Here we have almost 
surely 


S*-U2m-1Q7,.yS*-? e mela) 


since, as shown in Sections 3.3.3 —3.2.5, a.s. r[m719z,.y] 27 —q and S* « Mr 
and since the r — q + 1 largest solutions of (3.2.68) are a.s. different. 
Now 


igs mney ys?) Pais Nrr—g(S* 2 2Qs S*-U2) 


is valid, and for M, one obtains from (3.2.76) 


~*~ 


M, = Z,V*. (98) 


With Lemma 3.3.1 and [A 1.7] the assertion follows. 
(b) Taking into account 7[M,] = 7[J/’M] = p — r and [A 1.3], from Lemma 
3.3.2 one concludes that 


Ob Lol = FaItm Sy J + JQIS* J) Fy ZL 
= FalJ¢m-8y,J" + IQ3S*J') (I+ + TL*) 
= Pa(J+ + JP) = 2. 
(c) The assertion w.r.t. (94) and thus to (93) follows from 
tesa S*UAQ3S* A) = th gf S*AQES* + AT,) 
= tra St Q5 + AS*) S*1"). 


As above, from this one obtains the assertion w.r.t. (95). 


(d) By M,, 62, &* we denote the MLE of the parameters M,, o7, X; in the model 
according Lemma 3.3.1, with the distribution assumption caused by (N), 
(V,), » = 2, 3. We consider the specification (V3) and initially use the denota- 
tions of Section 3.2.5. (3.2.68) and (3.2.71) imply 


m&te£(I, + Dy) Dal’ = HLL, + Dy) 
x LYQLAT, + Dy) Lee 
= Pgnpi8-?Q8 PP gags. 
From this and from (3.2.73) one obtains 


Ape ae mASU2P g144 1 S-U2QS-12P eins SU2, 


3.3. Further estimation procedures 307 


consequently, in the denotations of (x), 

>* = (m —— n) m18* 4. SUP gspop er S*U2OF S*-U2P cing gs STH? 

+ NM S*FU2P corp er S¥U2 (99) 

According to part (a) of [A 1.6] and (a) above, for f+ := Fy, JL* we have the 
relation R(L+) = #+. By virtue of Lemma 3.3.2 one obtains 

LY’ Qofht = LF g(J4m8y, J" + JOR’) Fi? = L*Q3h* , (100) 

SLi = JS*J'L+ = JS*J'F IL* = JS*t*, (101) 

iY’Si! = £*'s*7*, (102) 


As with (a) one obtains the MLE & of 2; in the model (LIFU*), (R), (A), (N), 
(V3). Now 2. = JX7J’ holds due to (vi); from (99)—(102) one obtains the 
assertion for © = JD*J’. 

Now we consider the specification (V.). From Section 3.2.4 one obtains 


6? = (mr)? tr [S*-(Z, — M,U — MV) (Z, — M,U — M,V)’, 
which implies, under consideration of (98), 

6? = (mr) tr [S*-(Sz,. + (Z. — MU) PyAZ, — M,UY)|. (103) 
Under (V3) (in the notation of Section 3.2.5) from (3.2.73) one obtains 

tuet sir. 
therefore (3.2.74), (3.2.75) imply 

B=, —2 (L214) it) ZU. 


(Here U* denotes the Moore-Penrose inverse; see Bunke and Bunke, 1986, 
[A 1.17].) 

It can be shown (cf. the proof of Theorem 3.2.9) that this formula obtained 
under (V;) is also valid under (V,) (2 nonrandom). Section 3.2.4 implies its 
validity also for (V.) (for & = S8*)), One thus obtains 


M, cs (i, as S* RL DAL Se het) 2 L*+) Z,U+ 
and 

(Z> = M,U) Py: = S*Y2P ceusperS* ?Qz, yS*?P gerngesS?. 
Hence (103) gives 

6? = (mr)-+ tr [S*-187 + P cwthpni Sta Oz ye oe) 


= (mr)-* tr [S*187. wy ie (L*1'S*f*+)-1 LO; gud}: 


20* 


308 Chapter 3. Models with errors-in-variables 


Analogously to (100) one obtains from Lemma 3.3.2 for L+ = Fy JL* that 
b'Qz.yh4 = L*'Qz,0.u Lh", 
which establishes together with (102) the assertion for 67. 


Let the MLE of £ obtained under the specification (V,) be denoted by 2. 


Remark 3.3.7 Under (V2) Qo is a function of the unknown parameter o;; 
then statement (c) for 2 = 1 — of provides the form of the MLE as a function 
of the observations. Let Z, ~ P%* € (LIFU*), (BR), (A), (N), (Ve), (Bz); o% be 
the corresponding parameter value; ? be defined as above; and ?™ be the 
MLE of ¥ under (Vj) for V; = {025}. Then (c) implies ?© = #®), 


Remark 3.3.8 According to Remark 3.3.6 one easily obtains the form of the 
MLE in the case r = p: we almost surely have 


i? 2 SY2y | 9-g(SY?2QyS- 12). 


Hence, in the case r = p, Sf is the eigenspace to the p — q largest eigen- 
values of S~1?Q,)S-1?. In the case g <r < p, Theorem 3.3.1(b) provides a 
corresponding statement: E,? is an eigenspace of ByQof%, which however does 
not necessarily correspond to the p — q largest eigenvalues. But this is the case 
for certain realizations of Z. 


Corollary 3.3.1 Let (Ai), a= 1,.:.,p, 4S... SA, be the ordered p-tuple of 
the eigenvalues of EyQ ok}. If 


Aq —< Amin 1S y,], 
then 
Se ocr BS Np, p-q(BoQoLs)- 


Proof. As almost surely S* € IM? it almost surely holds that r[E)] = p: Theo- 
rem 3.3.1 (a), part (a) of [A 1.6], and Lemma 3.3.2 imply 


E,? bye + Ip pg S*-Y2QES*-12) 


= RI*m Sy I") + 1p. Ps EoQoEPs)- 
Now 
E,Q.k) = J¢m8y J" + Pk QokiPs 


establishes the assertion. 


Remark 3.3.9 In the model (LIFU*), (R), (A), (V,), i.e. in the model without 
normality assumption, the estimator £” remains almost surely defined by 
virtue of assumption P”: < y, and the relation u, << P” valid under (N). In 
accordance with the explanations of Section 3.2.2 we call ?” a weighted least 
squares estimator (WLSE) in this case. 


3.3. Further estimation procedures 309 


Remark 3.3.10 The statements of Theorem 3.3.1 as well as the following ones 
carry over to the explicit case, i.e. the case where (R) is replaced by (Ex). 
Indeed this means the introduction of an additional restriction in the model: 
L£ € Q (Myx (p-q- But according to Remark 3.2.4 it already almost surely holds 
that L* € e(Mgx(r-q)) (under the assumptions of Theorem 3.3.1); therefore, 
according to part (c) of [A 1.6] one almost surely obtains ? in ‘explicit form’: 
we almost surely have ? € e(M,x:p-q)). Then, for the estimators B;, i = 1, 2, 
of the parameters B;, 7 = 1, 2 (cf. (ii)) it almost surely holds that 


B, pals e-(S*2y, .(S*-2Q38*-12)) 
and 


A 


B, —— Is E. 
For the estimator B of B one obtains almost surely 


B= o\(Fa(J* + JZ*)). 


3.3.4.4 Estimation using instrumental variables in linear 
functional relations 


To provide a heuristic basis for instrumental variable estimation in the model. 
(LIFU*), its connection to a model with random unobservable variables, i.e. 
LIFU- (cf. Section 3.1.5) is of importance. 


Let p, 9,7, MEN, GSrSp,qgd<p, m= p—zgq, and a linear space, 
JI, J € &,, be given. Furthermore, let P* be a set of probability distributions 
P over [IR?, B?] with 

fodP =0,, —- Rex’ dP) =I 
and let 
ME = {E € ME | AZ) € Lpp-g}- 


We consider a distribution model (LIFU~-) for a sequence {2;}j<1,... Of in- 
dependent random p-vectors 2;,7 = 1,...,™, which is described by the follow- 
ing relations: 


z=utSs 
Ue PF Sao {N (9p, 2',) | 7a € M=}, 
SOF 


(for independent , 6). We consider the problem of estimating the derived para- 
meter 
Mae tiara) = 


310 Chapter 3. Models with errors-in-variables 


The assumption R(Z,) € Vp,p-q can be construed as an assumption of a linear 
functional relationship between the components of the (partially) unobservable 
random vector U. 

Obviously, the model (LIFU- ) corresponds to the model (LIFU*) if (2;);—1,.. ms 
(Ui);-1,..m 18 put into correspondence to Z, M (cf. comment to Definition 


satisfied. However, a normal distribution family is distinguished here in a 
natural way. 

As in the model (LIFUt) we will make additional assumptions (R), (N), 
(V3) and also consider explicit models (LIFU-) (such ones which satisfy (Ex)). 
As above, the special form of J and Jy is assumed. Remark 3.3.2 remains valid 
in models (LIFU-), (R), (N) and (LIFU*), (Ex), (N), respectively, in particular 
because of the normality of P*. 

Analogous to the case of (LIFU*) the assumption 2(D$).= J causes that 
certain components of the random vector 1 are observable in the case r < p. 
In the case r = qg a reduction to a regression model results (with stochastic 
regressors; for this compare Example 3.4.5 below). 

Now, the randomness of u;, 7 = 1,...,m assumed here in distinction to 
(LIFU*) causes that the parameter £ need not be identifiable in the model 
(in the sense of (90)). 


Theorem 3.3.2 There exist p, gq, r€ N, dS rS p,q <p, so that for any 
m =p — q the parameter £ is nonidentifiable in the model (LIFU—), (R), (N). 
(V3) (cf. Remark 3.3.5). 


Proof. From the proof of Theorem 3.1.1 (Section 3.1.4) one can see that its 
statement remains valid if if relates to a model (3.1.5), (3.1.6) with « = 0, 
E§ = 0 (homogeneous case). But this model is a special case of (LIFU-), (Ex), 
namely for r = p = 2, gq = 1. Then the modified assertion of Theorem 3.1.1 
states that the parameter B = o-1\(f) (according to (ii)) and thus £ are non- 
identifiable in the model. Hence, this holds in the model with (R) instead of 
(Ex). 


Therefore identifiability of £ in a model (LIFU-), (R), (N), (V3) has in general 
to be guaranteed by additional information or modified model assumptions. The 
cases mainly considered in the literature as the following ones: 


(A) In comparison to (N), (V3), additional information is assumed with respect 
to DG, i.e. (N), (V,) or (V2) or another restricted class of normal distri- 
butions. For this we refer to Section 3.2.1. 

(B) Under (N), (V3), ?” is fixed as class of nonnormal distributions (cf. Theorem 
3.1.1, Section 3.1.4). Then the minimum distance method yields consistent 
estimators (Wolfowitz, 1952, 1953, 1957). Other procedures have been 
given by Neyman (1951), Rubin (1956), Spiegelmann (1979). 

(C) Method of instrumental variables. Here an enlarged model is considered 
instead of the additional information mentioned in (A). 


3.3. Further estimation procedures 311 


As in Section 3.3.1 consider independent observations U;, 071, .+., m, of an 
instrumental variable (IV) u, here assumed as random. For n, € IN let u take 
values in IR™. For the present, uncorrelatedness of wu and is required. In the. 
following we proceed according to Remark 3.3.6. It can be shown that the 
statement of Theorem 3.3.2 remains valid under g < r < 9, too. 

We introduce the following notation: 


(xv) Wo := [Uta], %:=[ule], ns=m+p—r 
OW i= [ef Jt’) at [ur J1’2] 
(Because of (DG) = J) it almost surely holds that J+’ = J*’z.) 


(xvi) For random vectors v;, 7 = 1, 2, let 


Cf , f 
E,,1= Dvy — Enp, = Eve}, 
Paton K 
Ba enki ig ys eed Sh hae 


We consider a distribution model for the sequence {2 ;};_1,.m of independent 
random (n, + p)-vectors 2;, 7 = 1, ..., m, which is described by the following 
relations: 


% = Wo + [0n, 15] 

Uo ©) Pre := 1 n.+0( Ono» ((24)) 24-73) | ((24))§243 © ye ? 
((2,))fz33 « Te}, 

SOF 


(for independent jt), $). We consider the problem of estimating the derived 
parameter 


£ := A(((24))}=33) 


(P". is nonempty: ife.g. £ = Ki ((Xi;))j=35) € Kt holds, then 24, € M>_, follows 
according to [A 1.5], and a completion to ((2;;))=15 € My becomes possible.) 
One recognizes that with the denotations of (xv) {#;};-1,... satisfies a model 
(LIFU-) where condition (R) is fulfilled by virtue of 2. € Nt>_, and [A 1.5]. 
Accordingly we denote the above model by (LIFU-), (IV), (R); it represents 
the enlargement the model (LIFU-), (R) which arises by considering the joint 
distribution of 2 and the instrumental variable w. 

In the model, due to (xv), (xvi) it holds that ¥,, = ((2i;))j=73, 2, = ((24,) ESS; 
then parameters 2,,,., X,,., and further ones are defined as well. Here 2, € My 
is assumed for simplicity. 


312 3.3. Further estimation procedures 


Now we investigate identifiability of £ in the model (LIFU-), (IV), (8), 
(N), (V3). For this we note that the distribution of % is determined by 2,, 
and that, because of 2’, = 2, the relation R(2,,) S # holds. As 2 is a 
function of 2,, it follows that £ is identifiable in the model (LIFU), (IV), (8), 
(N), (V3) with the additional assumption 


(id) [2uw) =p — 4. 
(The representation (104) below and [A 3.17] imply that the model (LIFU-) 
(LV), (R), (N), (V3), (Id) is nonempty.) 

In the following we will also show the necessity of this condition in the sense 


that in any submodel of (LIFU-), (IV), (RB), (N), (V3), ¥ is identifiable only 
if (Id) is satisfied there. 


(xvii) For £ € ¥ let 


peeee 


ME = {2 € ME, |S = ((2y)) rs, (Za) HTS € MZ, 


ALN? DS (Dyas =a St ot, AU) mer, Itt) ed 


My = {MEN <1 MM) SL, IMM = (On un, ipa 


Obviously the model (LIFU-), (IV), (R), (N), (V3) for the sequence {2 ;};-1,. m 
of independent observations can be written as 


%o © (Na sp(Onr5, 27.) | 2z, © U WG}. 
LER 


With that we can now make precise the statement on the meaning of condition 
(Id) for the identifiability of £ in the model (LIFU-), (IV), (RB), (N), (V3). 
Obviously (Id) is equivalent to 7[2,,.] = p — q. 


Theorem 3.3.3 Let & be a nonempty subset of U It. In a model 
FER 
%y © {Pry | LEM, 0B, F ER 
for 


Psy = Neer Omans 2) 
the parameter £ is identifiable if and only if it holds that: 
Ry ~ Wap Oeryr= speec ts 20 = gt OL.) ee, 


(i — (Osa | I,) 1) 
imply 
1220) = p — gq. 


Proof. It was already shown that the condition is sufficient for identifiability. 
To prove necessity we assume the existence of 2, £ with xz, € Me n , 


3.3. Further estimation procedures 313 


£ ER, such that 7[2,,] <p —q holds (with the above meaning of Z,,). 
We have to show the existence of £’€ R, £’ + F so that Qn M2 on M3. is 
nonempty. Let a mapping 


fs OM, > MZ X Moxa X MZ 
LER 
be defined by: if 2 ~ Nasp(On+p, Xz), 2, € Me holds, then let f(2,,) 
= (Ly, LX’ Xz.) (with the above meaning of w, 2). According to [A 3.17] it 
suffices to prove f(M%}) n f(Mz-) n f(Q) + BW. Now for f(2,,) = (Ly, M, Lew) 
the relation r[M] < p —q holds due to the assumption, and according to 
[A 3.17] we have M € M4. Since 9 is open and because of part (g) of [A 3.16] 
an £’ eR, £’ + £ with Me M4 always exists. Now [A 3.17] implies (2,,, 
M, Zw) € f(M-) and thus the assertion. 


Remark 3.3.11 Condition (Id) is thus the minimal (in the sense of Theorem 
3.3.3) identifiability condition for £ in the model (LIFU-), (IV), (R), (N), (V3). 
It represents a generalization of a condition of nonvanishing correlations of 
unobservable and instrumental variables (cf. (3.3.10)). One can see that w, 
which includes J+’u, functions as an instrumental variable. Now condition 
(Id) is assumed in the sequel. 


Now we proceed to obtain the maximum likelihood estimator of £ in the 
model (LIFU-), (IV), (RB), (N), (V3), (Id). First we give a further equivalent 
representation of the model where we denote the statement of a family of 
conditional distributions for z under the condition w = w by #|w = w ©). 


Lemma 3.3.3. The model (LIFU~), (IV), (R), (N), (V3), (Id) for the sequence 
{Zoi}:-1,...m Of independent random n, + p-vectors %j, i + 1,...,m can be 
described in the following way: 


% = [wi J’z] 


w © {Nn(On, Zo) | Xn € Mr} 
eh (104) 
z|w = w®© {N,(Mw, 2.) | Mé U My o MPZ4,, 
LER 


ere Valo w € IR”. 
Proof. The mapping given by 
P* > (P™, {P#=" | w € R%}) 
for 
aD (Eas Once) 205 2 =(0pxn, | Lp) % 


priw=4 -— Ni (Sake Us Xew) > w € IR" 


is injective. With the help of [A 3.17] one infers the parametrization stated 
where (Id) is equivalent to Me Mpy%. 


314 Chapter 3. Models with errors-in-variables 


It turns out that for realizations w;,7 = 1..., m, the family of distributions 
pale=v, { —1,...,m just generates a model (LIFU*) with the assumptions 
considered previously. Thus the results on maximum likelihood estimation 
obtained there can be made available for the present model. 


Theorem 3.3.4 In a model (LIFU-), (IV), (R), (N), (V3) let assumption 
(B;) be fulfilled. Then it holds that: 

An MLE ? of £ based on the continuous density for (2;);-1,... almost surely 
exists and is almost surely uniquely determined. For a realization {2i};-1,..m> 2 


is given by the MLE in a model (LIFU*), (R), (A), (N), (V3) decree to 
Theorem 3.3.1 if 


Wy = One 
a RU ((wi)imt,...m)') 


(w; = Gi On xr) 201 % = (Onsen: t Lp) cose += 1, ++ +5 M) 


ts put there. 

Proof. Let P*!”=” be defined for w € IR” according to Lemma 3.3.3; then PY #!?=” 
<r, w € IR” holds. Let py,” be the continuous density; then for the con- 
tinuous densities of 2) and w (with X = 2,,) it holds that 


p(w, t) = pyri (wt) = ph (w) py tr—"(t), — (w, t) € IR" x Re. 


From the above formula one recognizes in which way the marginal density of 
w and the conditional density of J’z depend on the parameter 2 of the joint 
density. According to Lemma 3.3.3, 2, and (M 2'z,y) vary independently in 
the model, £ is a function of M. Furthermore, we revenue that for realizations 


{wWi}i-t..m With 7[(w; _m] =, the expression I Di, sep: ;) Tepresents 


the continuous Hiei of Z, in a model (LIFU*), (R), (A), (N), (V3) with 


Ww = ANCE aie a Furthermore, for such w;, i = 1,...,m there exists 
max I py (wi). As 7[(Wi)i=1,....m] = ” almost surely holds, the assertion fol- 
EyEM> 1=1 [ 


lows “ith Theorem 3.3.1. & 


Here the MLE in models (LIFU-), (IV), (R), (N) is mainly presented to 
serve as a heuristic basis of estimation by means of instrumental variables in 
models (LIFU*). For a generalization of the model considered here, Robinson 
(1974) obtained the MLE from a heuristically based minimization principle 
and gave an asymptotic treatment, where results of Zellner (1970) and Gold- 
berger (1972) occur as special cases. Related problems were treated by Izenman 
(1975). Here we confine ourselves to stating that (LIFU-), (IV), (R) represents 
a model of independent identically distributed observations, for which general 
conditions can be specified (see [A 2.9]) which ensure consistency, asymptotic 
normality, and asymptotic efficiency of the MLE. 


4 
3.3. Further estimation procedures 315 


“ 


3.3.4.5 Estimation using instrumental variables in linear functional relations 
with nonrandom unobservable variables 


For determining the instrumental variables estimator in models (LIFU*) 
now the general connection with models (LIFU-) is crucial (cf. Section 3.1.5). 
We consider a model (LIFU*), (R), (N), (V3). According to Section 3.2.5 the 
likelihood function is unbounded here (cf. Lemma 3.3.1, Theorem 3.2.8, and 
the remark thereafter), this corresponds to the nonidentifiability of £ in the 
model (LIFU_), (R), (N), (Vs) (Theorem 3.3.2). We consider a solution of the 
estimation problem by means of instrumental variables, analogously to the 
case of (LIFU-). 

The random IV w is now replaced by a nonrandom matrix U € My. m 
additionally given in the model (LIFU*), called an IV-matrix in the following, 
with appropriate properties. Let 


Wer (U 1 MGV Wea seee's Ni=m+p—r. 


Analogous to the case of LIFU~ the entire known nonrandom matrix W will 
function as an [V-matrix in the following (cf. Remark 3.3.11). The nonrandom- 
ness of U assumed here (with random $) corresponds to the uncorrelatedness 
of § and wu required in the case of LIFU-. 
In accordance with Theorem 3.3.4 we define the IV-estimator as an MLE ac- 
cording to Theorem 3.3.1 in a formal model setup (LIF U*), (RB), (Aw), (N), (V3) 
for UW = R(W’). 
More precisely, assume that Z follows a distribution model (LIFU*), (R), (N), 
(V;). Consider the additional restriction (Aw), not present in this model, and 
estimate £ by a formal MLE under this restriction, with observations Z. 
According to Remark 3.3.3, at first a certain condition is to be imposed on 
n = dim @ to ensure the existence of the so defined IV-estimator. Remark 
3.3.13 below then implies the almost sure existence of this estimator as soon 
as in the model (LIFU*), (R), (Aw), (N), (V3) the MLE according to Theorem 
3.3.1 almost surely exists. Then condition (Id) has its counterpart in certain 
asymptotic requirements which ensure consistency. These requirements can be 
interpreted as conditions of an ‘asymptotically nonvanishing correlation’ 
between unobservable and observable variables represented by M and W 
respectively, and thus they are analogous to condition (Id) in LIFU™ (cf. 
Remark 3.3.11). 


Remark 3.3.12 A formal analogue of (Id) is 
[UW'| =p —¢q. (105) 
The indicated procedure is heuristically justified by the fact that the IV- 
estimator in LIFUt under (V3) is shaped after the MLE resulting from Theorem 


3.3.4 for a model (LIFU-), (IV), (R), (N), (V3), (Id). This construction now 
is to be carried over to specifications (V,), y = 1, 2. A justification for this will 


316 Chapter 3. Models with errors-in-variables we 
be given in Section 3.4, where the asymptotic efficiency in a certain sense of 
the obtained estimator is shown under (V,), v = 1, 2, 3. 


Remark 3.3.13 For P%: € (LIFU*), (RB), (N), (V,), Ps € (LIFU*), (R), (Ay), 
(N), (V,), it obviously holds that P# << P%. 


Analogous to Remark 3.3.9 we now drop assumption (N) in the adequate 
model, and summarizing we give the following definition. 


Definition 3.3.1 Let Wy € Quin M2p—q with R(M;)— Wo. Let the 
model (LIFU*) (R),(Aw,), (V,) be adequate for Z,. Let a linear space W © &m,ns 
P—-F SNM, with R(M{)— WKH W, be given, so that condition (B,) is 
satisfied. Let ?(-) be the (almost surely defined) maximum likelihood estimator for 
£ in a model (LIFU*), (R), (Aw), (N), (V,) according to Theorem 3.3.1. The 
estimator which is almost surely defined by 


Lo og P (Zz) 


we call the canonical instrumental variables estimator (CIVE) of £ for the linear 
space W. Under (V,), » = 2,3 let estimators 6c and Yo of oa; and X, respectively 
be defined analogously. 


The weighted least squares estimator (WLSE) according to Remark 3.3.9 
and the MLE according to Theorem 3.3.1 are special cases for @ = Wy) and 
(N), respectively. If (Ex) holds in the adequate model, then, according to 
Remark 3.3.10 the CIVE Be of B is obtained by 


Bo := o Lo) : 


The linear space @ will be referred to as the 1V-space. 

In the examples of Section 3.4.3 it will be proved that the simple IV-esti- 
mator (3.3.9) and further ones (BLUE in the linear model, 2SLS-estimators) 
result as special cases for certain dimensions 7, q, 7, n. 

Beside the CIVE one can consider alternative instrumental variable esti- 
mators resulting in a natural way from alternatives to the MLE in the model 
under (Aw), (N) (cf. Section 3.3.1). The asymptotic comparison then forms the 
main subject of the following Section 3.4 where also the alternative estimators 
are discussed in some more detail (Section 3.4.5, Examples 3.4.10—3.4.12). 


3.4 Asymptotic theory for linear functional 
relations with nonrandom unobservable variables 
and with independent errors 


3.4.1 Introduction 


This section presents some asymptotic properties of estimators in the model 
LIFU* in the case of independent identically distributed error variables. For 
this we will rely on the approach to estimation by means of instrumental 


3.4, Asymptotic theory for linear functional relations 317 


variables developed in Section 3.3.4. The most important LIFU* models 
treated in Section 3.1.3 are also included. 

In the asymptotics of LIFU+ models it is the crucial feature that in the 
general case the problem of an indefinitely increasing number of unknown 
incidental parameters occurs. This distinguishes the present model from more 
common parametric models of mathematical statistics, in particular from the 
model of independent identically distributed observations, as well as from 
the linear model where the incidental parameters (the regressors) are known. 
The question of consistent estimability of the structural parameter, which 
arises in this connection, was treated in Section 3.1.5. There it turned out 
(Theorem 3.1.6) that for the consistent _estimability in these models certain 
restrictions are necessary with respect to the unknown parameters. 

This will be treated here in some more detail. Sufficient conditions will be 
stated for the consistency of the canonical instrumental variables estimator 
(CIVE) defined in Section 3.3.4: These conditions concern either the distri- 
bution model for the errors or the information on the incidental parameters 
(provided by instrumental variables or their generalization to be introduced 
here). 

The main part of this section, however, deals with the problem of efficiency 
of estimators. We ask whether the CIVE is asymptotically efficient against 
the alternative estimators of Section 3.3.1, and in particular against the 2SLS- 
estimator (more precisely against its analogue; see Example 3.4.10). 

Answering this question we must also take into account the specifics of the 
present model, i.e. the possibly indefinitely increasing number of the unknown 
incidental parameters. In a model of independent identically distributed ob- 
servations (under certain regularity assumptions) the MLE is an asymptotically 
efficient estimator (cfs [A 2.9]). Such a statement would also be of importance 
here since the MLE under normal distribution is a special case of the CIVE. 

The model treated here fits into the general scheme of independent, not 
necessarily identically distributed observations with a structural parameter to 
be estimated (cf. Remark 3.4.1 below). Under general assumptions, local 
asymptotic normality (cf. [A 2.7]) for a fixed sequence of incidental parameters 
can be proved for such a model, and thus an lower bound can be established 
for the limit covariance matrix of asymptotically normal estimators (Ander- 
sen, 1970; Philippow and Roussas, 1973; Ibragimov and Khasminski, 1979). 
But with unknown incidental parameters the MLE attains this bound only 
under restrictive model assumptions (Hoadley, 1971); in general this is the case 
with known incidental parameters. The latter forms the basis for the theory 
of asymptotic efficiency in simultaneous equation models of econometrics (cf. 
Theil, 1971; Schénfeld, 1971) as well as in the linear model (Philippow and 
Roussas, 1973; Nussbaum, 1977). But in general the MLE does not attain the 
lower bound in question and the problem of efficiency remains open. 

We propose a solution by considering a special class of asymptotically normal 
estimators (asymptotic Q,-estimators) and investigating optimality within 


318 Chapter 3. Models with errors-in-variables 


this class (in the sense of the covariance matrix of the limiting distribution). 
This procedure is similar to that in the linear model, where in certain model- 
specific classes of estimators optimality statements are obtained, also in an 
asymptotic sense, by means of the Gauss-Markov theorem (Bunke and Bunke, 
1986, chap. 2). There is also a relation to the class of asymptotic minimum 
contrast estimators in the case of independent identically distributed obser- 
vations, within which the MLE can easily be obtained as optimal (Michel 
and Pfanzagl, 1971). 

Let us now sketch for introductory purposes the underlying principle in the 
case of the simplest model. We consider the two-dimensional model of a homo- 
geneous linear functional relationship 


e= E+ hi 
(1) 
y; = BE, + Sai, ¢=1,...,m 


with unknown structural parameter 6, unknown incidental parameters &;, 
7 =1,...,m, and independent identically distributed normal error variables 
Gi := (51, Sai)’, @ = 1,..., m. According to Section 3.1.5 additional informa- 
tion on the incidental parameters or the error distribution is necessary for the 
consistent estimation of 8. A practically important assumption of this kind 
consists in the setup of a ‘model with replicated observations’ (see (3.1.4) and 
Example 3.4.7 below) 


y= 64+ oy 
Yip Ply Soe Ps, EO) iS ee 


in which a consistent estimate of DO can be obtained. For simplicity of pre- 
sentation let us assume that in model (1) a consistent estimator &,, of Dg (of 
the ANOVA type) is given. Then for the statistic 


m J m 
Vat of Yay: 
#=1 ; j #1 
On — m1 so a a ee { Paar ee a > 
m m 
De i 2; Yi 
1=1 | t=1 
under the assumption m~! }Y &} ———+ h > 0 we have the relation 
i=1 
P 1 
On moo” B ie P) : (2) 


On the basis of this relation various consistent estimators can be constructed. 
But it is not clear which of these should be preferred. We have seen that the theory 
of the MLE can not answer this question. 


3.4, Asymptotic theory for linear functional relations 319 


By A we denote the vector (411, 2, Go) for a symmetric (2X 2) matrix 
A = ((a;;))iZ¥3. Then (2) can be written 


On ——+ (h, hB, h6?). 


The set 4 := {(h,hB,h6)|h > 0, B € IR4} forms a surface in IR3. Let us consider 
real-valued functions f defined on an open subset -4* of IR® containing A, with 
the property 


fh, hB, hp?) =B VWh>0, BER. (3) 
If f is continuous on 4*, then f provides a consistent estimator hie of B: 


Bm = Om). 


A function of this kind we have e.g. with 


f(x) = 2/2, A*® = {x = (%;)j21,2,2 | 21 +0}. 


Now, as a simple sample function @,, is asymptotically normal: 


L£{m'(0,, — (h, hB, hB?))\ + N3(0s, A). 


If f is continuously differentiable on 4* with derivative df € IR’, then for B,, 
this implies 


Bn — B = Af((h, hB, hB?)) Om + op(m-¥2). (4) 


Differentiating (3) with respect to h and 8, one obtains linear restrictions 
for df: 
df((h, HB, hB*)) A = Jo (5) 


for certain parameter-dependent matrices A, Jy. Let us consider general esti- 
mators B», which satisfy (4) and (5) for an arbitrary nonrandom, parameter- 
dependent C in place of df((h, hp, h6?)) (asymptotic Qm-estimators). These include 
in particular the MLE. For estimators of this kind it obviously holds that 
(mBq — B)) sar N(O, CAC’). 

A minimization of CAC’ subject to CA = Jy as in the theorem of Gauss-Markov 
yields a lower bound for the limiting covariance matrix. It turns out that the 
MLE attains this bound; hence it is asymptotically efficient within the class 
of asymptotic Q,,-estimators. In particular, it dominates some alternatives 
considered in the literature (modified 2SLS-estimators, minimum-contrast 
estimators based on Q,,). 

The following investigations concern a general model (LIFU*) with instru- 
mental variables according to Section 3.3.4 which comprises a number of 


320 Chapter 3. Models with errors-in-variables 


variants. We shall develop a general version of the procedure outlined (Section 
3.4.5). Accordingly, we fix a distribution model {P, | ¢€ @} for the infinite 
sequence of observations {2;j};cxq so that Z, := (%;);=1,...m for m = mo obeys 
a model (LIFU*). 

For given p,q,7€ N, p=r=q, p > q, define in case r < p 


die (eax ls Jt = [eer Onepenl 
and for r = p, 
oh awes J ere) 
and let J = RJ). Let Py be a set of probability distributions P over [IR?, 8?] 
with 
J Pie hig = a (AP 0b, ey wl ered tse 
where J’P is the image of P under the linear mapping J’ : IR? > R’. 
Let also be given: 


— an ™ € N, a sequence {n(m)}n>m, of natural numbers, and an « € [0, 1] 
with 


n(m) mt 


+a, nlm) =p —q for m= m 


m—->oo 
— a sequence (Y,,,}m>m, of linear spaces with 
On € eas m = Mo 
— aset Ve {LE MF | A(L) = J}. 
Let 
PF := {212P | VEV,P€ Pp}, 


and let PS denote the countably infinite product of an element P of ?é with 
itself. Let 


Me r= {{wiien |ui€ R?,1€N, AS € 24,4, € M2: 
£+ FT =R?, R(wi)ins,..m) = £, A(((Miins,...m) I+) Vn, 


and 

B= (P| {udion), PPE P, (uiiew € M”. 
Let {P5, 6 € O} be given by 

tio = dion + Siiewn ~ Po, 

Siiew ~ (P°)%, 

O == TIE 


3.4. Asymptotic theory for linear functional relations 321 


We consider the problem of estimating the parameter £ on the basis of observations 
{@i}i—1,....m for m — oo. 

This model induces a distribution model (LIFU*), (R) according to Section 
3.3.4 for each of the random matrices Z,,, m = mp; the stated model is the 
analogue of (LIFU*), (R) for an increasing number of observations m. The 
increasing dimension of @,, is admitted in order to allow a unified asymptotic 
treatment of the model including the case @,, = IR”, i.e. the case of a model 
without instrumental variables. In addition, this assumption allows us to 
cover further interesting special cases, like the one of an increasing number of 
groups in the ‘model with replicated observations’ (cf. (3.2.1), Example 3.4.7). 
In Section 3.4.3 we will provide more examples which demonstrate the use- 
fulness of this general model. The model can be regarded as one with generali- 
zed nonrandom instrumental variables. Besides nonrandomness, the generali- 
zation consists in admitting dim W,, ——> 

We consider some further assumptions about P* and {u;};ey. Let 


Oo 


geres 


(N) Po = {N(0p; Pa) 
(V) v=, 

with V, from Section 3.3.4, (iv), »y = 1, 2,3 
(Ex) £ + Jo = R* 

(Jo = R [0 p-q) xq | Zq]), cf. Section 3.3.4) 
(A) MM NSW. m = mM 


If [Ex) or (A) are assumed, we speak of an explicit or adequate case, respec- 
tively. ; 

(C1) H]=p—4 

(C2) m1M,,M/,——>+ H, 


m m—->co 


(C3) (a) m — n(m) — 7 


m—->co 


(b) (m — n(m))? My, PwsM in aoa? Ha 


m m—->oo 


(c) [Hf —oflul=p —9 


(C4) Rie Ne meg 0] 
(C5) max m|M,Pw,ey” ||? sz 0. 


1sism 


Here e'”) denotes the ith unit vector in R™ and 1,, := (1,..., 1)’ € R. 


91 Nonlinear Regression 


322 Chapter 3. Models with errors-in-variables 


All statements of Section 3.4 refer to this model, with a parameter space which is 
restricted or specified from PE K M° by the current assumptions. This model 
we denote by {P| # € O} where O represents the parameter space in question. 
As in Section 3.3.4 (cf. Remark 3.3.4) we do not distinguish between 
{P, | ® € O} and the induced model for {J’2;};eq with given {J+’z;};-y. Further- 
more, we proceed according to Remark 3.3.6. 

Let us now fix some notation occurring throughout Section 3.4. 


(i) We use the denotations of Section 3.3.4, partially supplied with an index 
m; in particular, 


Ge ¥' ) (y) (r) *(v) iy(r) 
Gate Lita Cm? Oa? Qom ? E,,; Eom? Mim, Mom, 
Lom) Ocm> Lens Sey Sir dare ie 2, 3 


and parameters B, B,, By, X,, 07. 
(ii) Let 


1g) = (dR) 
for T°?) according to Section 3.3.4 (xii), » = 1, 2, 3 
(iii) Let 
O” := OF + n(m) m (63, — o;) LT o(r), ) =o 


with X from Section 3.3.4, (iv); 6%,, 18 defined in Theorem 3.4.3 below. 
Let 


me = Vim + r(m) m\(Gom — 07) I'ZII (rv), v = 1,2,3. 
Then analogously to Lemma 3.3.2 we verify 
QO = Pp (J4m8y, JY + IQS’) F 
(iv) Under (Ex) let V,, be defined by 
WM, sph p 3 
This uniquely determines V,,. Let 
Nore pet OGG) Nine 
Then N,,, = U,,, holds. 
(v) Let 
A, :=mU,Py M’,, Hon = mM, M,, 


Aim i= (m — n(m))-1M »Py1M', « 


. 


3.4, Asymptotic theory for linear functional relations _ 323 


(vi) Let 
re eae 2 aan ofl s(v), G° := A, — n(m) m= Aanl g(r), 
PSS AB TA Sie 

(vii) Let 
GP =H, »=1,2, GP := (1—«) A + off. 


(viii) Let A be one of the matrices defined under (v), (vi), and (vii). Then 


~ 


R(A) S # is always true. This analogously holds for the limits for m > oo. 
Thus, under (Ex) there exists an A € Mt,_, with A = [gAL’, and A is 
uniquely determined. Accordingly let H, H,, Hy, Hom, Ha, Ham G”, 
Go), Gy”, v = 1, 2, 3 be determined by 
A= LAatl,, © H,,'= LpH,,by, ete. 
and g by g = Lzg. 
(ix) If AG”) = F holds, then according to [A 1.5] there is a representation 
GO = Paw (P5314 Py. + IGS’) Fi) 


with uniquely determined E™ EM ry (pr, GEM, As JUG I+ 
= lim m18y,, » = 1, 2,3, and since R(H4) S F (because of R(M;p) 


m—->oo 


CW ,,), We infer, as in Lemma 3.3.2, that #) is independent of ». Let 
B= &", tie Ber pas 
(x) Let 
8 := 25, »=1,2, 8:=5,4 A, 
with from Section 3.3.4 (iv). Let 
By := (Pg. + [Sy }?) Fe, vy = 1,2, 3. 
(xi) Let 
Th TAA yt oY 
for L! € Myyqs R(L+) = #2. 
(xii) Let 
M(m) := m — n(m) Ly3)(r), V1 $2;'3:. 


(xiii) Let I’, be the projection operator in R* onto the linear subspace {A| A 
€ M,}. We have 


y= Fle tI), = sle+ V2 


(for definition and properties of the matrices [{s,1, I{s; see [A 1.10]). 


21* 


324 Chapter 3. Models with errors-in-variables 


Let f; € Nee x8(s+1)/2 be such that 
r= 1 hee 1 ai Lenny 
(xiv) In some examples we additionally use the following denotations: 


Xoq i= (O(r-a) x (p=) ier iQ ) Zn 


eh) CI 


Y,, := (0 Te) Leg 


qx (p—q) 
5S (ahs | Xcel 
Then we have 
Leg Mint dent aly | Lew | Xam Yl: 
Hor A =O Fea) 
Toy t= iy, Lk = 0s. 


(xv) Let 
®, = HO ® OS’ 
We = EGC’ & 66’. 


For the sake of clarity, some further expressions depending on the parameter 
# will be indexed by #. As in Section 3.3.4 we will frequently suppress the 
index ». 


Remark 3.4.1 Consider a distribution model for a sequence of independent 
observations {2} jen: 


{(PesJiew | (7s Filion) € I x FY, 


whese y; is identifiable in the distribution model for z;, for 7 € N. There, 
the parameter y is called the structural parameter and the parameters yj, 
7 € IN incidental parameters. Accordingly, in the present model P* is a struc- 
tural parameter and 7;, 7 € N are incidental parameters. As p —r com- 
ponents of wu; are known, mw; has r — q functionally independent components; 
thus we recognize infinitely many unknown incidental parameters in the 
model {P, | #€ 6}. As mentioned before, this is the specific feature of the 
model with respect to asymptotic theory. Under the restriction (A) it persists 
in general: then the number of unknown functionally independent components 
of (H4i);—1,....m equals (r — q) n(m) and thus is admitted as infinitely increasing ; 
the convergence assumptions do not restrict (4;);1.m- Only in the case of 
constant n(m) under (A) does a model with finite-dimensional parameter 
space result (cf. Example 3.4.3). General models with infinitely many unknown 


3.4. Asymptotic theory for linear functional relations | 325 


incidental parameters have been considered by Wald (1948), Neyman and 
Scott (1948), Andersen (1970a, b), and Pfanzagl (1970). 

In a suitable parametrization one obtains (P*, £) as a structural parameter; 
under (V,), » = 2,3, we consider the problem of estimating the structural 
parameter (07, £) or (X;, £), respectively. 

A sequence of estimators {IT mim=m, = vi mZm)}m=m, Will be referred to in 
short as an estimator in the sequel. 

We see that under the specification (V,) the canonical instrumental variables 
estimation (CIVE) Poor according to Definition 3.3.1 is defined for m => mp. 
For sufficiently large m also (B,) (Section 3.3.4 (xiii)) is satisfied, so that also 
under (V,) the estimator (6%m, 2cm) of (o7, £) is defined. Assume this holds 
for m = mo. Under (V3) condition (B3) is needed. Now, if (C3) (a) holds, then 
(Bs) is satisfied for sufficiently large m. We see that under (V3) (C3) (a) the 
estimator (Lom, Lom) of (Xz, £) is defined for m > mp. 


3.4.2 Consistency 


In this section consistency of the CIVE of the structural parameter is to be 
proved. This will be carried out for a general specification (V,), vy = 1, 2, 3, 
i.e. for the case of a not necessarily normal distribution model / for the error 
variable ¢. The problem of consistency of the estimators 6%,, and ricci. 
vestigated as well; here in general inconsistency obtains. In the adequate case 
the estimators can be modified to be consistent, or consistent estimators can 
be found. 

After this the meaning of the assumptions made in each case to prove the con- 
sistency of CIVE is discussed. These represent restrictions on the parameter 
sequence {;};cx Which are necessary in dependence on the error distribution 
model (cf. Section 3.1.5). These assumptions can be construed as analogues of 
the correlation condition (Id) in case of a random IV model (LIFU_). First 
we show consistency of the CIVE #¢, under (V,), » = 1, 2, 3. For this we need 
the following lemma. 


ML mPm Li, — Eom Link LBin 


wl 


Lemma 3.4.1 Let {Pinbmsm, be a sequence of projectors Pm € M=, m= mM. 
= ME mPmMn + MnPnbm) + Opn (tr [Prn]) 


Then 
Proof. Since 
M1 LinP mL oa Eym Zink mL m = m*USmPmM, As Nie ee) 
ot Nome dees ae ra Bye 0., Pas: 
it suffices to show that 


DS nPmbn = O(m- tr [Pnl) 


326 Chapter 3. Models with errors-in-variables 


or 
Da'm§ ,Pybnt = Om? tr [Pp}) 


uniformly for a € {@ € R? | ||z|| = 1}. Now a’S,, is a vector of independent 
identically distributed random variables a’6;,7 = 1,...,m. We have 


Ea'5 = 0, DoS = aac 


Pay = Ha'So'a © ao'a = (a' & a’) Pela @ a). 
Thus, according to [A 3.18], 


Da'mSyPnbaa = m-* Y ph{(a’ ® a’) yela @ a) — 3(a’E;a)?) 


w=1 
+ 2m-? tr [Pn] (a’ 2,0)? 
for 
dap = ((pis) i= eee ac 


peres 


The assertion follows with 
m m 
m-* >) pis Sm? Y pix = m tr [P,]. 
i=1 i=1 
This lemma yields convergence statements for the matrices Qo, under the 


respective model assumptions. 


Lemma 3.4.2 Under (V,), » = 1, 2, 


mF Ge. 


m—->co 


Proof. Observe that 
LOon = Hg 
According to Lemma 3.4.1 it suffices to show that 
mE mPy,Mn + MnPw,Sm) = or(1)- 
But this immediately follows from 
Lyme mPo,Mn =%pxp> — Dobm = Im @ Xs 
Dyn 3h ,,P yp, My, = mA, © Z, = o(1) 
and the model assumption 
A,— > HH." 


m= m—-oo 


In the case of the model assumption (V3), the conditions (C3) are needed: 
For this we prove the following lemma. 


3.4. Asymptotic theory for linear functional relations 327 


Lemma 3.4.3 Under (V3), (C3) (a) (b), we have 


so 2 A, +E. 


mm m—-0co 
Proof. In Lemma 3.4.2, replace @,,, m, MM nPy Mn —— Hf and m+ co 
The following lemma is an immediate consequence. 
Lemma 3.4.4 Under (V3), (C) (a), (b), 


(3) P Fa 
i rverera ie 


The following lemma immediately prepares the proof of the consistency of 
~Cm: 


Lemma 3.4.5 Under (V,), (C1), v = 1, 2, or under (V3), (C3), we have 
(2) Om Par G, MG) =F 


(b) E,, ++ E 
(c) Eom > Ey, Ey € Mi xp: 


Proof. (a) The convergence holds due to Lemmas 3.4.2 and 3.4.4. As@ = lim Gn, 


m—>co 
LG, = Ogxm> mm, holds for Le M,x4, AL) = £1, we obtain 
L'G = 05.m and thus RG) S £. The assumptions (C1) or (C3) entail 7[@] 
= p — q; this implies the assertion. 


(b) Part (a) implies J+’OomJ+ —s J’GJ+ € M>_,; since by virtue of [A 1.5], 


m—* 


Lemma 3.3.2, and (ix) from Section 3.4.1, we have 
By = J’ QomI* (I QomI*) 1, B= SEIMIVGS!)4, 
the assertion follows. 


(c) According to Lemma 3.4.3, S,, + So; because of R(A4) S I (which is a 


consequence of 2(M’,J+) © W,,), we have R(So) = J. As a result [Sj]? 
—> [Sj }?2, RSP ]}/2) = J, and from the definition of Ey, (Theorem 3.3.1 (b)) 
the assertion follows. @ 


With that we are now able to prove the consistency of the CIVE Pom under 
the specifications (V,), » = 1, 2,3. Observe that the notion of probability 
convergence on the set &,,»-q is well defined by the pertaining structure of a 
differentiable manifold (cf. [A 3.16)]). 


Theorem 3.4.1, Under (V,), (C1), v = 1, 2 or under (V3), (C3), for the CIVE 
Gs a aes of the structural parameter £, we have 


Tom eee ge 


moo 


328 Chapter 3. Models with errors-in-variables 


Proof. Lemma 3.4.5 yee 
EonQonEom ae HGH, € MexD- 


Let (Ai,m)i=1,...99 41.m SS +++ SAp,m be the ordered p-tuple of the eigenvalues 
of Bom QomB on: Let 


Am = {Zn € Mom | Ae mm — = Aninlm 'Sy,,_,]} > 


where (A; m)i—1,....m> Mim are understood as functions of Z,,. 
Lemma 3.4.5 implies 


m'Sy,, wa? TU GT* € M5. 


m—* 


and then according to [A 1.1], gis —_+ 0. It follows that P;(4A,) ==> 0 


™- 


hence by virtue of Corrollary 3.3.1, 
PI {BomLom = "p,p-q (BomQomEom)}) a 0. 


The above implies H)GH, € M!?—” (see [A 1.2]); since M!?” is open in M, 
and 1p,p-q is conunnous on MP ([A 1.4]), it follows that 
Np.p-q{Hom@omEom) ao E,f£: From the continuity of the mapping 

Ae Ket Ad.) Aneta 2 ie Ur Pate 


(see [A 3.16d]), it follows that 


Eq np,p-q(EomQomLom) a AG 


m—>oo 


This implies the assertion: 


For the limiting behaviour of the matrix JT, the following statement can 
be shown. 


Lemma 3.4.6 

(a) Under ee (Cl), 
1h es a oll 

(b) Under bad (C3), 


A 


IE 


Se 7 


Proof. (a) Since the mapping f£ ++ is continuous on &,,, and 
{£ € Bap-¢ | £ + J = R®} is open in &,,-, (parts (c) and (f) of [A 3.16}), 
Theorem 3.4.1 implies 74, ——+ £ aiee to Beet ) of a oe there exist 
Ls} mzxmgs E+ such that Di, D1¢M,.,, 2 PLR = £1 with 
LD} + I+. In accordance with Section 3.4.1 a ae ces - : 4 (xii), let 


m m—0co 
TT, = L2(L2/ E74) LA! = EA (E2 ELA) D2’. 


/ 


3.4, Asymptotic theory for linear functional relations 329 


Then 
1, *—+ o2L'(LY EL!) LY = o3ll. 


(b) 
due to Lemma. 3.4.3, and of R(H4) Cf. w 


which holds 


Now we consider estimation of the structural parameter o? under the speci- 
fication (V,). 


Theorem 3.4.2 Under (V2), (C1), (C2), for the estimator {6%} m=m, Of the 
structural parameter o; we have 


6% + (1 —a + r-lgx) of + 1-1) tr [Z*(Ay — A). 


Cm mae 


Proof. Using (C2) one shows analogously to Lemma 3.4.2 that 


m Sz» Sone 


m Fis 


Then Lemmas 3.4.2 and 3.4.6 imply 
MUTE 7,0 oar WA + «%:). 


m m—>co 


Now 7H = Onxp holds; from the form of 6%, according to Theorem 3.3.1 (d) 
and from tr [J7X,] = dim 2?”£1 = dim J'f+ = q (which holds due to part 
(a) of [A 1.6]), it follows that 


m-* tr [I nQz,.W_] a> %929 (6) 
and hence the assertion. 


This means that the estimator 6%,, for o7 is generally inconsistent. This also 
holds in the adequate case; indeed, then (A) and the definitions of A, and 
Hom only imply that A, = Homt, H = Hy. However, in the adequate case 
consistency can be obtained by modifying 6%,,: the estimator 


Bin = (1 — a + 9 Mgay 1 Bb 


is consistent for o7 under (V2), (A), (C1). Furthermore, in the general (not 
necessarily adequate) case, provided that « + 0, we can immediately find a 
consistent estimator on the basis of (6): the estimator 


Gm = (omg)? tr [TmQz,,.0 5] (7) 


is consistent for Or under (V;), (C1), « + 0. This estimator is not defined for 
« = 0. Below we give an estimator which is consistent also in this case, provi- 
ded condition (C2) is met. 


Theorem 3.4.3 Under (V.), (C1), (C2) for the estimator 
Gbm = (mg)? tr [1 mQz.,] 


330 Chapter 3. Models with errors-in-variables 


of the structural parameter o; we have 


a P ‘ 
oom ———> 0;- 


m—co 


Proof. Analogously to Lemma 3.4.2 one shows, using (C2), that 


mQz, ——+ Hy + 2¢. 
The definition of M, implies 2(A) S £ and thus /7Hy = 0,,». From this with 
Lemma 3.4.6 (a) the assertion follows. @ 


The question of efficiency of the estimators for o7 and X; (under (Vs)) will 
not be treated here; the consistent estimator 6%, is needed in the following 
to construct the class of estimators to be compared with Pom. Now, the counter- 
part of Lemma 3.4.5 (a) for Q,, is as follows. 


Lemma 3.4.7. Under (V,), (C1) or under (V2), (OL), (C2), or under (V3), (C3), 
we have 


OSG SHOALS 


m-—->oco 


Proof. The proof is obtained immediately from Lemma 3.4.5, Theorem 3.4.3, 
and from the definition of O,,. 


Now we consider estimation of the structural parameter 2; under (V3). 


Theorem 3.4.4 Under (V3), (C3) for the estimator (Se ea of the structural 
parameter X', we have 


Lom — + (1 —a)(H4+2;)+ oz! P silt gi 02! 


CO 


Proof. Lemmas 3.4.4 and 3.4.6 (b) imply 


1, Qomll np ——> IGT’ = 0,» 


m—>co 


and with Lemma 3.4.3 one obtains 
SIT »n(m) m1 Sy es od llX;, = ake! Prt ys de!. 


From this and from the form of Yen, according to Theorem 3.3.1 (d) the asser- 
tion follows. @ 


Thus the estimator Yo, is in general not consistent; in the adequate case 
one obtains consistency if « = 0. The estimator S®) is consistent for 2; in 
the adequate case, according to Lemma 3.4.3. 


Remark 3.4.2 Let us consider the explicit case, i.e. the case of a model restric- 
ted by the assumption (Ex). Then under the assumptions of Theorem 3.4.1 


ie) 7 


\ 


3.4. Asymptotic theory for linear functional relations 331 


for the CIVE By, of the structural parameter B we have 


/ 


W a 
Bon Sar B 


because of the form of Bc, according to Remark 3.3.10 and the continuity 
of o~! (see [A 3.16 (a)]). 


Now let us interpret the obtained results on consistency as well as the as- 
sumptions. In particular, we wish to understand the assumptions as counter- 
parts of the conditions on the error distribution or on the instrumental variable 
in the model (LIFU-) of Section 3.3.4. Note that in the present model the 
dimension of the IV-space @,, is increasing in general; therefore the analogy 
with the model (LIFU~) with given dimension of IV is to be understood some- 
what loosely. 

Condition (C1) represents an asymptotic counterpart of (3.3.105) ((3.3.105) 
was not assumed in section 3.3.4). Ifm — dim Y,, sow? © is satisfied, i.e. under 
(C3) (a), then (C1) can be seen as an essential restriction of the possible values 
of the parameter {w;};-y or of the unknown incidental parameters. Thus (C1) 
expresses that additional information is available in the model in form of the 
sequence of the [V-spaces {@n}m>m,- In general the case @,, = R™, m = mo 
is included, in which the sequence of the IV-spaces {@ }m>m, does not provide 
additional information. In this case (C1) can be interpreted as a regularity 
condition in the sequence model, which is an asymptotic counterpart of 
3 G1 San I 

Thus condition (C1) could be understood as a condition of ‘asymptotically 
nonvanishing correlation’ between partially unobserved variables M,, and 
instrumental variables W,, (with W,, = R(W,,)); in case n(m) = n, m = mM, 
this corresponds to the condition (Id) in (LIFU-), (IV) of Section 3.3.4. But 
since dim @,, ———>+ co is admitted, (C1) can not be construed as a direct 
counterpart of (Id); (C1) is weaker than such an assumption. 

Under (V,), (V2), condition (C1) suffices for the consistency of the CIVE 
of £, since already sufficient information on P® is available (cf. Section 3.3.4. 
(A)—(C)). Indeed, under (V,), (V2) an instrumental variable is not necessary 
for consistent estimation if (C1) is satisfied for W,, = R™, m = mp (see above). 
But under (V3) condition (C1) is in general no longer sufficient. The condi- 
tions (C3) guaranteeing under (V3) the consistency of the CIVE for f are 
specific for the present model. They represent the counterpart of the condition 
(Id) in (LIFU-), (IV) for the present case of generalized nonrandom instru- 
mental variables ({@n}m>m,.m! dim ,, ——=> « € [0, 1]). Condition (C3) (a) 
excludes @,, = R™, m = mp; it may be considered as necessary to obtain the 
required information on 2;, (C3) (b) is a regularity condition which restricts 
the ‘asymptotic deviation’ from the adequate model (with (A)). In the ade- 
quate case these conditions just mean consistent estimability of 2; (cf. Lemma 


332 Chapter 3. Models with errors-in-variables 


3.4.3). But in general consistent estimability of 2; is not necessary for the 
consistency of 2¢m- 

Compared with (C1), condition (C3) (c) can be considered as the genuine 
analogue of the correlation condition (Id) for the case of increasing dimension 
of @,. Indeed, condition (C3) (c) remains satisfied if {u;};oy is substituted 
by {ui + ¢iicq for a sequence of independent identically distributed R?- 
valued random variables {d;};-y. But such a property could be required for 
a condition about {wi};e_ and {Wy} nsm,. Which would correspond to (Id). 

In the case « = 0, i.e. with ‘essentially finite’ dimension of Y,, for m — oo, 
(C3) (c) becomes (C1) and can then be interpreted as above. In the adequate 
case, (C3) (c) becomes (C1), since then Hy, = 0,,, holds. Condition (A) 
represents a counterpart to a condition of ‘full correlation’ in the model (LIFU-), 
(IV): 

Day 


ww 


=='0) 


PXp* 


Under (A) condition (C1) is, as above in the case Y,, = IR™, a regularity condi- 
tion in the model. 


3.4.3 Examples 


In this section we discuss several special cases of the model introduced in 
Section 3.4.1, as well as the meaning of the results obtained in each case. At 
the same time the motivation for the generality of the model will be clarified, 
as several types of a linear functional relationship, some of which were already 
introduced in the preceding sections (3.1.3, 3.3.1), are included. 


Example 3.4.1 (Haxplicit case) As the further results in the model (limiting 
distribution, asymptotic efficiency) concern the explicit case, we give an 
equivalent formulation of the model and of the assumptions, making use of 
the simple parametrization possible under (Ex). By virtue of (iv) we see that 
A(M;,J+) S W,, is equivalent to R(Nj_) GW». The convergence assumption 
in the definition of M° is equivalent to m 1N»Py Nm asa? H. In view of 
(viii) the remaining model assumptions can also be expressed in terms of N,,. 

The formal definition of the model which coincides under (Ex) with the 
one introduced in Section 3.4.1 is the following: Let 


Re = {Edi lf —liiail, &i€ Ree, & €R-1, i€N, 


aH € MF: A(((Ard)inn,...m)) FS Vn, mS mo, 


and 
Or (PF, Bete); PUPS wB © MGS ey eee Oo os 


ee 


3.4. Asymptotic theory for linear functional relations 333 
Let {P, | 0 € O} be given by 

(Zidieon = (Lakition + (Chien ~ Py 

Sihiew ~ (P*)® 

ONSITE S Max w—a ee ae 


Here {Esi}icw 18 a sequence of unknown incidential parameters which satisfies only 
a convergence assumption (cf. Remark 3.4.1). The equivalent formulations of 
the additional model assumptions (with VN, = (€;);-1,..m) are: 


(A) AN) SW p 
(Cl) fHi=p—q 


(C2) mh ,,N;,—> py 
(C3) (a) m — n(m) => 00 


(b) (m — n(m))1N,Py+N;, >> Ha 
(e) fH —oHy=p —4 


(C4) m1N,1_,— ag 
(C5) max m-! ||NnPw ey” ||? soatgall 
1Sism 


Example 3.4.2 Let us consider the case of the trivial IV-space @,, = R™, 
ie. n(m) =m, m = m. Then (A) is satisfied, and under (V,), (N), » = 1,2 
the CIVE is the MLE. Condition (C3) (a) is not fulfilled; under (V3) the CIVE 
is not defined: 


Example 3.4.3 Let us consider the case of an IV-space Y,, with fixed dimen- 
sion: n(m) =n, m = mp. The conditions (B,) are satisfied for sufficiently 
large m; hence Bom exists. Let Win € Mnsem>s R(Wy,) = W,m => mo, and under 
(Ex) let the following assumptions be fulfilled: 


(C6) mW Wo — > Ss Cty 


m m—>co 


(b) m4N,,W,, ss TE MES 


™m m—co 


(C7) max m-! ||W,,e!||? +0. 


1Sism 
Lf, in addition (C2) holds, then, as can easily be verified, the conditions (C1), 
(C3), (C5) are fulfilled. We then have G = T5717". 

Condition (C6) (v) represents the asymptotic counterpart of (Id) in the 
model (LIFU-), (IV) of Section 3.3.4. Thus Bo, is always consistent. Under 
(V>) Gm is also consistent; in the adequate case this also holds for 6@,,. Under 
(V;), in the adequate case Lem and S') are consistent for 2;. 


mo 


334 Chapter 3. Models with errors-in-variables 


Example 3.4.4 In Example 3.4.3 let n = p —q. Then we have 7[Qz_.w,] 
= p — q, and according to Theorem 3.3.1 (b), (c) one obtains 


Lom at R(Qz,.w,,)= R(Zin Win) - 
In view of 7[Qz, .w,-v.,] S7 — 7 (Lemma 3.3.2), Remark 3.3.10 implies 


Boom = 0" R(Qz,,,-Wm Nip)) 
= ¥,,(Pw,, — Pr'yn) Xtm(Xen(Pw,, — Px.) Xam); 


Buca ae VisNin ay BycmXamN tin 


in the notation of (xiv)). This implies that the CIVE and the 2SLS-estimator 
(defined for the adequate case in Section 3.3.1.5) coincide (cf. (3.3.20)). 


Example 3.4.5 (Linear regression model) Let r = q. One obtains (as (Ex) 
is always satisfied, cf. section 3.3.4) 


Y, = BN, + 8%, 


with known JN,,. In view of Remark 3.3.6 we obtain from Theorem 3.3.1 (a) 
that for any possible choice of the [V-space &@,, the CIVE for B, if it exists, 
concides with the BLUE for B (cf. Bunke and Bunke, 1986, section 2.1): 


A 


Bom SS in ZomM in <a YANG: 


since B = B, (cf. Section 3.3.4 (ii)). For the choice @,, = A(N,,) the conditions 
reduce to 


mN,,Ni, Perera g He (0) 
Hence the linear model is a special case in the sense of Remark 3.3.6. But, 
since the following results are without interest for this case, we will not mention 
it any further. 


Example 3.4.6 In Example 3.4.4 let p = 3, r= 2, g=1. For N,, = 1,, 
m = mg one obtains a bivariate inhomogeneous explicit model (LIFU*) 


Bo eG, 
yi; = B, + Bofoi + 53;, t=1,....m 


(for OP pester = Yo (@i)i—1 ed 
((e4; | EDsat,..m = % in (i), (xiv). Let an IV-matrix Up = (u;)jo1,m€ Mism 


COO 1 Cem ccc. MUG aati on! MO Thais AP, Seine ainsi ea sion, re (A. ema ae 4/0 dl SA 


with the property R(U;,) + A(1,) be given. Then, with Y,, = Rn | U;,)) 


Un, ae (UF )i=i 


> 


m i= Un, — P1,) for the CIVE we have according to 


penne 


3.4, Asymptotic theory for linear functional relations 335 


Example 3.4.4 that 


Boom = VP ot X3m(XomPo*Xom) 2 


m ™m 
* ok 
= LYitin | Yi Litin 
i=1 i=1 


A 


aM =. ie A = 
Bicm =e Ly Zam 1,m = Dm Byon§m 


m m 
(=. vt Sees Yay et un) Thus the CIVE is the simple instrumen- 
i=1 i=1 


tal variables estimator already introduced in Section 3.3.1.3 (cf. (3.3.9)). 


Example 3.4.7 Let us consider 4 model with (Ex),7 = p —1,N,, = 1), 
m= mp, (inhomogeneous model, cf. Example 3.3.1). Let an IV-matrix 
Wa € Men myrcm be given by 


Wim = Diag ieee eee Licatnim) | 


Thus W,, gives a grouping of the observations 2;,7 = 1, ..., m into n(m) groups 


n(m) 
with k(z7, m) elements each s k(t, m) = m}). In the adequate case a ‘model 
i=1 
with replicated observations’ (cf. (3.2.1)) results. Mose generally, the grouping 
is derived from additional information on (,;);-1,..m> @-g- in the caser —q = 1 
from a known rank statistic for (3;);-1..m (Ware, 1972)). The condition 
n(m) = p — q (ef. Remark 3.3.3), ie. n =r —gq-+ 1, states that there are 
at least as many groups as there are points needed to determine an (r — q)- 
dimensional affine manifold. 

With this example the chosen generality of the model of Section 3.4.1 and 
in particular the assumption a € [0, 1] can be justified more clearly. With 
this, in addition to the case of constant group number ”, the case of increasing 
group number n(m) ———> oo is admitted, and with « > Oalso the case of constant 
allocation number (k(i, ae; tee (1) 5 A mo). Heuristically stated, 
the latter case obviously represents a ‘less favourable’ or less regular model 
with respect to asymptotic theory than the case of fixed n, due to the presence 
of infinitely many unknown incidental parameters. In situations of real appli- 
cation the choice of a model with « > 0 means that a ‘perturbation’ of a 
model with finite parameter number is considered, which means reducing ideali- 
zation. 


Example 3.4.8 In Example 3.4.7 let n(m) = p — q, i.e. the number of groups 
is a minimal one. According to Example 3.4.4 one obtains 


o(Brcm) ae R(Lom(Pw,, er P,,)) 


== Rl (Zim care Zee ta) 


336 Chapter 3. Models with errors-in-variables 
Byom = Y..m — Bec Bm 


(Fn = (b(é, m))-2 YS 2, Ili, m) := {ren aa 
— 


jeI(i,m) 


é m 
= Da k(1, my}, oe Ino ys is Wins hm analogous) : 
l=1 


i=1 


) 


Geometrically this means that the affine manifold described by (o(Bocm), 2a 
== 3) 


(cf. Example 3.3.1) is set just through the p — q group means. For p 
q = 1 one obtains the grouping estimator of (3.3.7). 


Example 3.4.9 In the model of Section 3.4.1 let, in addition, a Wpm € @np—¢> 
R( Mim) <— Wem <= Wm» be given. Then one can consider the CIVE Ppm Cor- 
responding to @p, as described in Definition 3.3.1. It always exists for 
sufficiently large m. Let Wm € Mn) ms Wem € Mths qx nim be such that 
Wm = R(W),), Wem = R(W),W p,). According to Example 3.4.4, then 


Lon = R( Zin Win W pm)? 


This yields a class of alternatives to the CIVE #¢,, for @,,; admitting certain 
random W?,,, with constant n(m), one obtains the class of ‘ordinary estimators’ 
of Villegas (1966) (cf. Section 3.4.6). Applying this procedure in the case of 
Example 3.4.7 for r = 2, ¢q = 1 yields 


n(m) n(m) 
°. Ay * = * 
Bopm a Sy Yi.mUim | Li.mUim 
x 


i i= 


for, Ut, = Wh )ierntw € Vikan ATL) = AWS (Py. — Py). This 
means applying a procedure of simple [V-estimation according to Example 
3.4.6 after a preliminary data reduction by averaging (cf. also Section 3.5.1). 


3.4.4 Asymptotic normality under normal distribution 


Let us now investigate the limiting distribution of the canonical instrumen- 
tal variables estimator for £. We confine ourselves to the explicit case, in which 
an asymptotic normal distribution can be proved for the (suitably normalized) 
estimator of the R%-9-valued parameter B. The CIVE Be, depends on the 
observations over S,, and Q,,. Here the general form of Q,,, in particular the 
general from of the [V-space @,,, requires a restriction to normal errors. 

As a first step we show the asymptotic normality of the (suitably normali- 
zed) matrix Q,. This result forms the basis for proving asymptotic normality 
not only of the CIVE but also of the alternative estimators, to be considered 
in the next section. 


3.4, Asymptotic theory for linear functional relations 337 


Lemma 3.4.8 Under (V,), (N) we have 


L£{m'¥2(On "t Gn) ae Nx pOpsxps At) 


moo 
i 8 
ae = 2o(L'¢ ®& dt) OF ate 40,(G9° & 2) L ) : 


n(m) 
Proof. Note that 9 = ae Let Py, = >) CimCjm be a spectral decomposition 


‘—1 
of Py. Since H,Q%2 = = G@), we have 
m!2(Qom ie: ae Saas m'2(Qom ae E5Qom) 
n(m) 
ars me yy (ZimCimCimZ m cars EZ mCim@j, alae (9) 
i=1 


Let A“) be the covariance matrix of (9). Since the summands of (9) are inde- 
pendent random variables, 4%) can be calculated using [A 3.19]; one obtains 
AG) ——— Ae), 
mt m—co 
Now it suffices to show that for all K€M,,, the expression m1? 
x tr[K(Qom —Gn)] either almost surely vanishes or is asymptotically 


distributed as N(0, K’ AWK). Here one can restrict oneself to K € My. Let 


D := (Py. + 5)¥? and let 


Pp 
DED = ¥ Addi, 4, € RY, ¢ = 1,::., 0 


i=1 


be a spectral decomposition of DKD. Then it suffices to show that for 
i=, oa eatery 1 


n(m) 
mV? 2 ((d; DZ nCim)® — Ly(d;D~'ZmCim)”) (10) 
— 
either almost surely vanishes or is asymptotically distributed as 
N(0, (d; © d;) D+ A D-*(d; ® d,)) (because of the independence of dd ;,D1Z » 
and djd;D"Z,,, k + 9). Now, if D-*d; € J+, then (10) almost surely vanishes. — 
n(m) 
If D-1d; ¢ J+ holds, then Y) (d;DZyCim)® 18 %nms,,-Aistributed with non- 
centrality parameter i= 


n(m) 
bm = Y (G;D Mn Cim)” 


i=1 
= 4,D"M,Py, MD \d;. 
Hence (10) can be represented as 
n(m) 
m2 > (yf — Byi) + mn + ome)? — BO + Om); 
i=2 


292 Nonlinear Regression 


338 Chapter 3. Models with errors-in-variables 


where {yj};eqy iS a sequence of independent N(0, 1)-distributed random variab- 
les. Here, in case « > 0, the first summand is asymptotically normal; in the 
case « = 0 it converges to zero in quadratic mean. The second summand is 
equal to 

m-Vfy? — By,)® + (26,m-1)¥? y, 


and hence, by virtue of m~,, ———+ d;D“1HD-'d, =: 6, it converges in distri- 
bution to N(0, 4d). Hence (10) is asy Tpeouoally normal, where the limit of 
the variance equals the variance of the limit distribution. Since this is also 


true for (9), the assertion follows. 


~ 


For the model assumption (Vs) one obtains 

Lemma 3.4.9 Under (V3), (N), (C3), 

£{(m — n(m))¥? (SY — 2, — Han) oY Want ventas se 
where 

A'$} = 2(2, © Zr) My + 40 (Hs ®@ Zr) Ty. 
Proof. The proof is analogous to the proof of Lemma 3.4.8, if we normalize 
with (m — n(m))¥? = (dim #4)? and observe that R(S,,) = J holds almost 
surely. 
_ In the following lemma we give a representation of the CIVE Bo, that is 
useful for the proof of asymptotic normality as wellas of asymptotic efficiency. 
Lemma 3.4.10 Under (V,), (Ex), (C1), vy = 1,2 or (V3), ((Ex), (C3) we have 


Bom =a oY R(QmEm)), m= mM (11) 
for 

Coes nee + Sx) Fe La. (12) 
where 

Ch pi (P31 + Sp) Fa'Ls (13) 

Gag)! = 2;0(B)*. (14) 


Proof. The pose goe (11), (12) ) follows immediately from Theorem 3.3.1 (b) 
if we put C,, = = BomEom and om = = A(Lg,.) = (oe Indeed, by Theorem 
3.3.1 (c) Qom can be replaced by Q,, in (b) (for A = n(m) m-(63,, — o?) under 
(V.)). The convergence (13) results from Lemma 3.4.5 (b), Lemma 3.4.3, and 
Theorem 3.4.1. in conjunction with Remark 3.4.2. Taking into account 
Ey € Mex» by Lemma 3.4.5 (ce), we obtain 
ROys)* = (Fg (Pax + 8p) Fs") o(B)! 

= P,(Pz. + So) Fxo(B)t = A PePs.F gL} + SoL4) 

= R(FgP5iJL}, + 2L4) = Z:0(B)* 
in view of [A 1.6a)]. 


Now we need a convergence statement on ji More 


. 3.4. Asymptotic theory for linear functional relations 339 


Lemma 3.4.11 Under (V,), (N), (Ex), (C1) we have 
Bom — B = Op(m-1?). 


Proof. Let us consider the representation of Bom according to Lemma 3.4.10, 
which may be written as 


RL...) = R(OnEm)s 


or, since R(O,,Cn) € OM,xip—q) is equivalent to [LiOnEn) = p—q by 
[A 1.3] and part (b) of [A 1.6], as 


Lt... = QnEn\LQmEn)?- 
Consequently 
mL Lg = m'2(Bom — B) 
= 0}? OnCn(LoQmEn)? 
= mL! (Qn — LeGnl's) En(LOQmEm)*- 
The assertion now follows from Lemma 3.4.8 and from 
En(LQmE mn)? > Eoo(L 600)? G1 
according to Lemma 3.4.7 and Lemma 3.4.10, where L',Cool =p —q isa 


consequence of (14). 


With this we can now show asymptotic normality of m1'2(Q@) — L,G\?L;). 
Let 
Ry, := m8z,.w,, —(m — n(m)) m2, + ony (15) 


Lemma 3.4.12 Under (V,), (N), (C2), » = 1, 2, 3, we have 


L{m?Ry} meer VK Ue Ae) 


where 


~ 


Mo, = 2(1 — x) (Zp @ Zt) Pp + 40 ((Ho — H) © 2) Lp. 


Proof. For m = n(m) 
J'Sz,.w,J ~ Wm — nlm), 2S, Py+M mn) 


and 
mM *MomPy+Mom = (m —n(m)) mA am = Ayn — Am a? Ay —Hi 


holds by (C2). For m(n) = n, Rp vanishes. With [A 2.11] the assertion follows, 
where Nyy. p(Op xp» Op? p2) is understood as a one-point distribution. 


22* 


340 Chapter 3. Models with errors-in-vairables 
Lemma 3.4.13 Under (V2), (N), (Ex), (C1), (C2), we have 


L{m'2(0,, cai LpGinl'z)} FSA Nox ppp» AS”) 
where 
A? = 2o(Z; @ ZX) Ip — 2x2q7((Z; @ Zr) JT Se Te x;)) 
+ Qo®q 322, + 40 y(LgG Ls @ 2) I. 
Proof. First, by Remark 3.3.7, BY, = BY), holds for sufficiently large m, 


if BY), is the CIVE for V, = {o7X} and o? the true parameter. Furthermore, 
we have 


O® — L,G°L, = QM — L,GYL', — n(m) m-\(62_ — 02) Z 


because of GO) = G® + H,,. The form of 6%, according to Theorem 3.4.3 for 
Lom = o(BY,) = o(BY wy )) in H,, implies 


Gtm — 0 = gt tr [MD n(mQz,, — Xr) 
= 73 tr [En Om] + 7? tr [IT (m8 7,27, 


— (m — n(m)) m+ z:)] ; 
Now . 
m2 tr [Tn LpH L's] 


= tr [mi(BYy — BY) (LY 21+) (BY), — B) Hy] 


for — Ligay- By Lemma 3.4.11 and Theorem 3.4.1 the above expression is 
op(1). Similarly, since 


(m — n(m)) m+ Han = Hom — Hy, 
(see also (v), (viii), Section 3.4.1) and since (C 2) is assumed, 
tr [IZ,,(m — n(m)) m-? LgH 4nL',| = op(m-1?). 
Hence, with R,, from (15) 
QD — LGD Ly = (Lp — n(mm) (mg) E1Y,) (QD — LyGOL5) 
(mm) (mq)* ELT gn + op(m-2) 
= (Ly. — nm) (mq) Z-17') (QD — L,09E5) 


4. n(m) (mg)? ET'Ry, + op(m-"?) (16) 


3.4. Asymptotic theory for linear functional relations 341 


in view of Lemmas 3.4.6 (a), 3.4.8, and 3.4.12. The independence of Q 
and R,, implies the limit distribution for the above expression; we calculate 


ag SIT’ AY = 202g! BIT (Ze @ Et) + 40q-? E-DLILZGOL’, 
= 2aqt EIT’ (Z; @ Ei) 
xg? S Il’ AVITE, = 2odq?2E,E% tr UIE IE] = 2u8q 3B, 
og? IT’ Ayers! = 2(1 — «) og 3d. 3, 
and thus we obtain 4?). m 
From Lemmas 3.4.8 and 3.4.9 we directly infer the result for Q®). 
Lemma 3.4.14 Under (V3), (N), (C3), we have 


L£{(m wi n(m))? (Q°) ar Go) Tso Nox p(Opxp: A$”) 


where 
A?) — (1 pt «) AY as «2A3) 


= 2a(Z, © 2) Ip + 40, (GO © ZX) L; 


With these results we are now able to establish the asymptotic normality 
of the CIVE Boy. 


Theorem 3.4.5 Under (V,), (N), (Ex), (Cl), » = 1,2, or (V3), (N), (Ex), 
(C3), we have 


LE (mi m))"? (B, — By eee EG No Onxo- 0)? Dos) 
where 
Dog = «G-L5(Z; + Lgl yg) Lp) G1 © Ly’ 2c Lz 
+ G\(Gy — oly) G1 @ Lt 2,Lt. 
Proof. Let 


CF := Cre b Coy 


for Oy» from Lemma 3.4.10. As in Lemma 3.4.11 we use Lemma 3.4.10 to 
obtain the representation 
(i(m))"? (Bom — B) = L4{70(n))"? (On — LeGmL) En(LoQmOm 
Fi 
From (Ona)? —.> C*G-1 and from Lemmas 3.4.8, 3.4.14, and Remark 
3.3.7 the assertion of the theorem follows for 


Dog = (G-10*" © L5') Ag(C*G © Lz). 


344 Chapter 3. Models with errors-in-variables 


The starting point is Lemma 3.4.7. It states that the matrix Q,, has a singu- 
lar probability limit: 


On > LpGLz EM, MGl=p—g, (18) 


m—>oo 


which by B= 0(R(L,GL%)) identifies the unknown parameter B. Now it 
can be shown that 7[Q,,] = p holds for m = mp. 


We consider the problem of consistent estimation of B on the basis of (18). 
Let 


fs Re@+vr _, Rua 
be a function with property 
{UL xGL) = B, W(B,G@)« Moxip—q X Mpa 1G] =p — 4. (19) 


Thus the function f applied to the p(p + 1)/2 different elements of a symmetric 
matrix Q (i.e. to [Q, cf. [A 1.10]) always yields B if A(Q) = o(B). If f is 
continuous in an open neighbourhood 4* of 


A= PLpGLy | BE Moxip—g> & € Myo AG] = p —H, 
then an estimator B,, of B is obtained in a natural way: 
By = (0;QOn). (20) 


Because of (18) and the continuity of fin an open neighbourhood of LCi. B,, 
is consistent. If f is additionally assumed continuously differentiable on 
A*, then the asymptotic normality of (i%(m))? (Q, — Lg@,L’;) and [A 2.10] 
can be used to derive the relation 


> 


Boe Bb =f eh On LCL + op((i(m))-¥”) , (21) 


where df denotes the total derivative of f. Now, differentiation of (19) with 
respect to B yields 


df(P,LeGL,) 030 ,LeGL, = Lyyp—ay> (22) 


and differentiation with respect to g, if we put G =f mP RA AE Re-V7—a+/2 
yields 


df(P",L2GL 5) Ol’, L2GLy aye Ogp—a)x (p—ap—at 12 * (23) 
Using a perturbation expansion, one easily calculates ézf,L 2GL',and gl, L Pan 
For this purpose on puts B, = B+ AB, gg =9 +4, and Gy= G+ AG, 
(for I"), g@ =% I’, qG% = 9) and by taking into account (3.3.89) one gets 


o 


\ 


3.4. Asymptotic theory for linear functional relations 345 


L3,G,L3, = LpGLy + AL,GBiLA' + LB,GL’, + L,G,L';) + O(22) and 
hence 


Og LeGL, = Pil(Lp@ @ Ly) + (Ld © LeG) Lip gas] 
= 2f(Ln6 ® Le), 
Of LnGL, = f(Ly @ Ls) I',-g 
Inserting these expression into (22) and (23) yields the linear contraints on 
af(l', LpGL;,). 


Now relation (21) in conjunction with (22) and (23) suggests a class of estima- 
tors which contains those of the form (20) as a special case. The definition 
again relates to the general model according to Section 3.4.1. 


Definition 3.4.1 Let the model {P| 8 € O} be described by (V,), vy = 1, 2, 3, 
(Ex), and further assumptions entailing (18). An estimator (Bea of the 
structural parameter B is called an asymptotic Qm-estimator if there exists a 
mapping p: O —> Mapa) x pip+1)2> P(P) = Cy 80 that 


(a) By — B= OLOn — LnGnbh + of((ifm)*), 6 O 
(b) C,4, = Jo, o€O 
for 
To 2= (Lqip-a) | Sq¢p-a) x (p-antn-a1) 2) 
Ay := (26 ,(L2@4 @ Ld) iP (Le @ Lz) P,.). 
First we show that the matrix A, of the linear contraints on CO, has full rank 


Lemma 3.4.15 Under (V,), (C1), » = 1, 2 or (V3), (C3) we have 


(p—Q)(P—q+1)/2+9(p—9) 
E Mp + /exip— D(p—q+)/2+q(p—-Q1° 


Proof. From the above calculation of Ay using perturbation expansion we 
see that 7145] < (p — q) (p —q + 1)/2 + a(p — q) implies that there are 


BYE Max (p-a)> FE Mpa, 

[B*, G*] + Op-aidp-- 80 that 

L,GBY LM + LABRGL', + LyQ*L'y = Opyp- 
Premultiplying by L;’ yields 

BYGL,, = 07255 


which implies B¥ = 04y(p-q)> G* = O(p-q) x(p- This is a contradiction. ™ 


346 Chapter 3. Models with errors-in-variables 
Now it turns out that the CIVE (Bom) asm is an element of this class of 
estimators. 


Theorem 3.4.6 If (V,), (N), (Hz), (C1); or (V2), (N), (Ex), (CL), (C2); or 
(V3), (N), (Ex), (C3) are satisfied, the CIVE (Bow}m=t, is an asymptotic Qn- 
estimator. Here we have to put Cy = Cog, Cog being defined by 


Cop 2= (G-(Co Lp)" Coo & Ly) is 


with Oo, from Lemma 3.4.10. 
Proof. As in the proof of Theorem 3.4.5, one obtains the representation (3.4.17) 
and from this 


> 


Bom — B = ((C,QnLo)-t C7, @ Le’) Fp Qm — LaGnbis- 
Now the property (a) of Definition 3.4.1 follows from 
((CQmLo)* Cp, ® Lz’) Py — Cop = on(1) 


and from Lemmas 3.4.8, 3.4.13, and 3.4.14. To prove the property (b) we 
calculate 


Cool’ y(LaG @ Lt) = (G-"(OopLg) 3 Cog © Lh’) (IgG © Lt) 
+ (G(GooL)-! Ooo @ L5') (Lé @ LpG) Ip-a.a} 


= Oq(p-9) x (p-g)(p-a41)/2> 


Remark 3.4.5 Theorem 3.4.6 illustrates that the class of asymptotic Q,,- 
estimators is not restricted to estimators based on the construction principle 
described in the beginning. Indeed, Theorem 3.3.1 shows that Be,, is not only 
a function of Q,, (compare also Remark 3.3.8). Further examples of asymp- 
totic Q,,-estimators will be discussed at the end of this section. The limit 
distribution statement for asymptotic Q,,-estimators is now a simple conse- 
quence of the definition and the asymptotic normality shown in Section 3.3.4 
of (fi(m))! (Qn — LnGnLis)- 


Theorem 3.4.7 Under (V1), (N), (Hx), (C1); or (V2), (N), (Ex), (C1), 
(C2); or (V3), (N), (Hx), (C3) we have for an asymptotic Q,,-estimator {B,,} 
satisfying Definition 3.4.1 for Cs that 


£4 (i(m))" (Bn aa B)t Gear Nax(p-q%qx(p-9)» DF({Bnu}m=m,)) ? 


MEM 


GE Sa 24 
DB ere) an CL Al Cs ( ) 


with A, from Lemmas 3.4.8, 3.4.13, and 3.4.14. 


3.4. Asymptotic theory for linear functional relations 347 


The notation D%({B,,} m>m,) Shall henceforth be used for the covariance matrix 
of the limit distribution of (i%(m))¥! 2 (B,, — B). The following discussion of 
asymptotic efficiency relates to optimality with respect to D%({B,,} 


m=m,) 


Definition 3.4.2 Let the model {P»|  € O} be given by (V,), » = 1, 2, 3, (Ex), 
and further assumptions which entail (18) and (24). Then an asymptotic Qy- 
estimator (Bis is said to be asymptotically efficient if for each asymptotic 
Qn-estimator By ei 


DB eon) SD Belo), . PSO. 


In the treatment of asymptotic efficiency we distinguish now between the 
cases « = Oanda > 0. 


Theorem 3.4.8 Under (V,), (N), (Ex), (C1); or (V2), (N), (Ex), (C1), 
(C2); or (V3), (N), (Ex), (C3), in the case of « = 0 for arbitrary asymptotic 
Qn-estimators Lea ae ia Pao we have 


B,, — B*, = op( (i#e(m))-1'2) ‘ 
Proof. Let (Bos, and (Bea, be estimators satisfying Definition 3.4.1 
for C', and C%, respectively. Then B,, — B*, = (Cy — Ct) P(Qm —- LpGnL',) 
4 op((i4(m))-™?).. Hence it suffices to show that 

(Cy — C5) 1 AL, TE Oq¢p—a) x p(p+1)/2* 
For this it suffices to prove 


RPA Ip) S RAy). (25) 
Now 
Ay = 40, (LeQoL'g @ 2) Tp 


by Lemmas 3.4.8, 3.4.13, and 3.4.14, and 
PAL, = Fi (Lz @ Ip) (Gols © Xr) Py, 
RAs) = ALP, (Ly @ Lj)) + RL ,(Le @ Ls) Py) 
= KPi(Ly ® Lg) + AL; (Le @ Ls) 
= Pi(Ly @ Ip) (RLp-¢ @ Lg) + HZp-q @ Ln))- 
To prove (25) it is sufficient to show that 
KM (Lz ® Ex) Fy) S RI p-q ® Lg) + Ap @ Le); 
and this is satisfied because of 
HRI p-q @ Lt) + RIp-q © Lp) = AIp-q © (Ly 1 Ls)) = RPP. 
With Theorems 3.4.5, 3.4.6, and 3.4.8, and Remark 3.4.3 (a) we obtain 


348 Chapter 3. Models with errors-in-variables 


Corollary 3.4.1 Under the assumptions of Theorem 3.4.8 for each asymptotic 
Qn-estimator (Bees we have 


D}({Bu}mzm,) = FG G7? @ Ly’ ZeL5. 


Let us now consider the case « > 0. Here the asymptotic Q,,-estimators are 
not asymptotically equivalent in general (in the sense of Theorem 3.4.8); then 
the structure of the limit covariance matrix in conjunction with the constraint 
(b) from Definition 3.4.1 allows an optimality statement. 


Theorem 3.4.9 Let the model {P, | 9 € O} be given by (V,), (N), (Hx), (Cl); 
or (V2), (N), (Ex), (C1), (C2); or (V3), (N), (Hx), (C3), and let «> 0. Let 
i bmm, be an asymptotic Q,-estimator satisfying Definition 3.4.1 for Cy. Then: 


a) If Cy = Coy, B E O (Cog sige Theorem 3.4.6), then {Bin bnzm, 18 asymrptotically | 
Sasi 


(b) Cy = Cog, 3 € O is necessary for the asymptotic efficiency of 1 Batis under 
(V1), (V3) and under (V2), « <1. 


(c) If {Bn}m=m, is asymptotically efficient, then DP({Bp mzm,) = Dos (Dos from 
Theorem 3.4.5). 

Proof. For fixed @€ @ we consider the problem of minimizing ChAT GC: 
C € Ma(p-—q) x p(p+i)2 under the constraint CA, = Jo. The assertion (a) is proved 
if Cog is the solution of this problem for all # € O (Theorem 3.4.7, Definition 
3.4.2). Then (c) follows from Theorems 3.4.7, 3.4.6, and 3.4.5. Assertion (b) 
is proved if under the given assumptions the solution is uniquely determined. 

From CA, = Jo we obtain 


CEUs) La) to aes One neon aie (26) 
which implies 
Omens Ol, p(Lg @ Lg) Pp-q( AL, © AL) 
for each A € M,-. Let S = E; + LyL',; then & > 0 and 
CP Al C= CP (Ay + 2a0lgh, @L,0,) PC’ 
= OF [Ay + 20LgLy @ Lg Ly + 405 (oI pg — Gy) Ly © Zr] f',C' 
+ 4CP [Lz(Go — «I pq) Lz @ SX) P,C'. (27) 


Let &®) := 2a?q-'I)(v) for » = 1, 2, 3; write & in the sequel. Using Lemmas 
3.4.8, 3.4.13, and 3.4.14, we now show that 


P45 + 2oLgLy @ Lal + 4Lp(oLy-q — Go) Ly @ Xe), 
= 2h (£@ £) fF, — af (EX @ 2) + (2 @ 2) TE) F 
+a, 25 Ly, (28) 


3.4. Asymptotic theory for linear functional relations 349 


where the relations We =e a: 22, =Ip+T1 (p} and the properties of J (P)} 
([A 1.10]) are exploited. Analogously to the proof of Theorem 3.4.8 and Corol- 
lary 3.4.1 it can be shown that a 


409, (Lp(Go — oD pq) Li, © E;) £0" 
= GG) — oly) G-? @ LM E,L4. (29) 
From (27), (28), and (29) we obtain 
CEA 
= OF [2aF @ F — a( LIT (SZ, @ 21) + (2, @ EH) WE) 
+ 62,27) 0,0' + GG) — oly_q) G-* @ LY E,L4. (30) 
Furthermore, it follows from (26) that 


Cr, Lpl's = Oq(p-a) x1» 
CES, =0Ts (31) 


Cf (E @ E) TT = CF (E12 @ E12) Pynys 
= OF (S02 @ £12) (I, — Pgang) 
eo Of (ls @ Ln (bes 
Ope. (32) 


Observing that (2; ® 2) ie ( @ £) IT one thus obtains from (30), (31), and 
(32) that 
CP Al’ ,C' — GG — oly») G-! @ Lt ZL} 
= Of [205 @ E — a(FSI(E @ £) + (EF @ LZ) ME) + EE F,0' 
= GAC! 
for : pas ie 
A := 2f[S @ F — &(2x)3 (LF @ 2) MINS @ Z)L>. 
Note that 
A me 2af', (S12 @ U2) 
x ia &(2cx)-2 (S12 @ Suey [TTT (S32 @ FV2)] (S2 @ SU2) fe 
= 20h" (EU? @ LU2) (Ip — g&(2a)-* M1] (S12 @ Eu) F,, 


350 Chapter 3. Models with errors-in-variables 
where 
Tf := q-US22 @ S12) TIT (E12 @ Fz) 
is a projection matrix, as can easily be verified. With (V,), v= 1, 30ra <1 
we have A > 0. For 
Cy = O41? 
Ay = A-we Ais 


C,,C;, is to be minimized under the constraint C,A, = Jo. A sufficient con- 
dition for minimization is 


R(O;) S R(A,), CyAx a Jo 
or 
RAC) GRAs), CAp =p: (33) 


Because of Lemma 3.4.15 and 4 > 0 the solution in C is uniquely determined. 
Under (V2), « = 1, we have g&(2x)"* = 1. Then let I7* € M,»,(p»4) be such 
that JT*II*’ = I,, — II. We have 


M(Z-¥2 @ T-U2) PA» 

= M(S-? @ X-1”) P, (Lp @ Ip) (26 @ Ly} (pq @ Ln) F5-4) 

34 Ope x [(p—a)(p-g+1)/2+9(P—9)] (34) 
Hence the constraint CAs = Jo can be written as 

OPEN? @ 212) (Le 1) (2-2 @ 2-42) FAs =o. 
Defining 

Cay = OF (E12 @ 51) 17 

Aye = 11? (2-8 @ 2-1) TAs, 


we see that C,,C,, is to be minimized under the constraint Cy, Ay. = Jo. 
A sufficient condition for minimization is 


KC.) S RAxy), CxrAnx = Jo 
or equivalently 
HIT*C i.) S RIT*A y), Cx Aan = So- 


For this (33) is sufficient because of rfl = aap and (34). 
To prove the theorem it now suffices to show that Coy fulfils (33). For Os 


3.4, Asymptotic theory for linear functional relations 351 
from Lemma 3.4.10 we have 
RAC) = RAT, (Eos @ L4)) 
= AP (SU? @ LM) (Ips — qo(2x)-? M1) (S169 @ S12] )). 
Now 


T(E2?Oyg @ ZV2L4) = g- S12 @ E12) LA’ EEC’, = 0 


p? x q(p—-q) 
because of 2(Gi,) = RLZ,LR) = R(SL4) according to Lemma 3.4.10. Hence 
RMA Ops) = RT (ZCoo © ZL4)) 
= AP; (Lz © Z4L)). 


The proof of (33) now proceeds as in Theorem 3.4.8. By Theorem 3.4.6, 
CosAe =S Jo. i 


With theorem 3.4.6 we immediately obtain the following result. 


Corollary 3.4.2 Under the assumptions of Theorem 3.4.9 the CIVE (Bohs ma, 
is asymptotically efficient. = 
This is a useful optimality result since the most important alternatives to the 
CIVE are also asymptotic Q,-estimators. 


Example 3.4.10 (Modified two-stage-LSE (2SLS)-estimators and related ones) 
In Section 3.3.1.4 in a special model we already mentioned the estimators 
(3.3.18) and (3.3.19), which are alternatives to the CIVE. Here we consider 
the generalizations of these estimators for the present model; it turns out that 
these can be construed as analogues of the 2SLS-estimators in simultaneous 
equation models, whereas the CIVE corresponds to the LIML (cf. Sections 
3.1.3, 3.3.1.5). 

First we consider the case of r = p. For the function f in the initial con- 
struction we set 


(PA) = | e-(R(ALy)) if R(ALo) € o(Mg x (p-a)) 


arbitrary otherwise 


(35) 


Since R(L,GL,Lo) = A(LzG) = o(B) if r[G] = p —q, this function fulfils 
(19). Moreover, f is continuously differentiable on an open neighbourhood 
A* of A; indeed, f may be represented as 
ee LgAL(Lp Aly)? if LAL] = p — 9 
ina-{% 
arbitrary otherwise 


Now, as 
A* — {fF A, A EM, | [LoALo] = p — 4} 


t 


352 Chapter 3. Models with errors-in-variables 


is open, f is continuously differentiable there, and 4 A*, the assertion 
follows. If we put 


Baym = HLOm) » 
then the estimator Bim is an asymptotic Qm-estimator under the assumptions of 
Theorem 3.4.6. 
Here Ly can now be replaced by an arbitrary matrix C € Mh x4,_,) if only 
parameter values are considered for which 7[L,C] = p — q. In particular, let 


Co Ems (pg) be--the ( z matrices consisting of » — q columns of J, (in 
Yes | 

their given order). Let C, = Ly. Now, in case the parameter B is restricted to 

the set 


A, = {BE Max (p-a) | ALC] il ea 


P 
PE 
in (35) are also asymptotic Q,,-estimators. Indeed, since 4, is open, the above 
derivations carry over; moreover, it can be shown that Mqp-¢)(4x) = 1. 

We adopt the convention of inserting a g-inverse (L,AC,)~ if R(AC,) 
€ (Myx (p-q))s Le. if (Zp AC,)~! does not exist. Then (35) implies 


Boom = Li QnC(LpQnC.) 


then the estimators Boa Rem Ne Stig ) resulting by replacing Ly by C, 


Observe that under (V3) in the case g = p — 1, Bo. is just an estimator of 
the form (3.3.20) (cf. Lemma 3.3.1). Here 2 = n(m) (m — n(m))-1, i.e. Bava 
is the 2SLS-estimator modified to consistency in the general case 
lim n(m) (m n(m))-1 =+ 0. Under (V,), » = 1, 2, the estimator Bae is the 
m—>0co \ 

analogue to this; in the general case g < p — 1, Baym is the natural generali- 
zation (the classical 2SLS-estimator and LIML in simultaneous equation mo- 


dels correspond to estimators of B in LIFU* in the case p — q = 1; see Section 

P 
Pg 
3.4.7 (for r = p) we just obtain the estimators (3.3.18), (3.3.19). 

In the literature, most investigations on the efficiency of estimators in 
linear functional relationships (LIFU*) have centred on the comparison of 
Bom and Ba 

The usual adaptation of this estimation method to the case q <r < p 
consists in replacing 2(Q,,C,) in (35) by 


3.3.1). For » = 2, gq = 1 we have == 2, and in the model of Example 


Ont! (I+ + IRC), 


where C% € Mt, (rq) consists of r — q columns of J,. According to Section 3.4.1 
(iii) and Lemma 3.3.2, as well as [A 1.5], the matrix Ff is a function of O,,; 


3.4. Asymptotic theory for linear functional relations 353 
the resulting estimator 
os 1 4 , 
ple (QnF gi (I+ + TRC%))) if Ong! (I++ TRO2) € OMe <cp-w) 
1(On) otherwise 


is thus a function of Q,,. Using Section 3.4.1 (iii) and [A 1.5] it is easily shown 
that Bi,)m is also an asymptotic Q,,-estimator if the parameter B, is restricted, 
in accordance with Section 3.3.4 (ii), to the set 


A, = {B, € Max (ra) | r[L,C?) sta Gye 
Analogously to Remark 3.3.10 it can be shown that 
Bum = (Biuom ey rt Biiainn 8 tgs (p-tyole Bachem Mg x (ra) 
Bocoym = Lig Qi.C2( Lin Qn. x) 
Byjm = Ly, ZamM in = Ls ati 


(Lo, Lz, according to (xiv)). 


Example 2.4.11 Consider the case r = p = 2, g = 1. Let I’, be determined 
by (cf. [A 1.10]) f, = Diag |1, 1, 1//2, 1]. Let f : R° > RY, 


Ha) = (5/24)? sgn xp if 7 = (%;)j-1,2,3, 4% =O 
0 otherwise 
Then 


{(PLpGL;)=B  V(B,G)«R*,G +0, 


ie. f fulfills (19). Furthermore, for each point from A \ {x | # = (%j);-1..3, 
Z_ — 0} there exists an open neighbourhood on which f is continuously diffe- 
rentiable. If the parameter value B = 0 is excluded we obtain, similarly to 
the above, that the estimator 


Bam = (meo2/Im11)"? 880. Umi2 


for Q,, = ((4mig))#=1'2 is an asymptotic Q,,-estimator. 
The estimator Bp,, was introduced by Tukey (1951) and discussed by Ma- 
dansky (1959) and Dorff and Gurland (1961). 


Example 3.4.12 (Minimum contrast estimators) We consider a modification 
of the principle of minimum contrast estimation (see Pfanzagl, 1969) to esti- 
mate B on the basis of the convergence (18). 

First let us suppose the general implicit case. Let 


PF: Rey QR 


23 Nonlinear Regression 


354 Chapter 3. Models with errors-in-variables 


be a continuous function with the property 
VQ EM: RQ) = £ € yp > FPG, £) < FUL, £’) (36) 
Whig SB aerate se os 


P,p—q* 
F is called a contrast function. A minimum contrast estimator (MCE) Sot me 
is an estimator which satisfies 


FLOm: ne) = inf F(L,Om, £). 
LELp pa 
From the compactness of &,p-, (see [A 3.1]) we infer the existence of an MCE 
with (nonrestricted) values in 2, ,-,; in the same way, the (strong) consistency 
can be shown if 0, +> G €M?>4 (analogously to Pfanzagl, 1969; see also 
3.5.4.1, Theorem 3.5.5.) 
In the explicit case we put 


GB ye—e | so(B)\e 


Let F be twice continuously differentiable (on an open neighbourhood of 
A X Max (p-q)- Let 


F(x, B) := a3 F(a, B) 

0,F'"(«, B) := 0,F"(x,B), —,8°’(a, B) := @3F"(a, B) 
and suppose 

[OF (P,L,GL5, B)| = ap — 9), 

V(B, G) € Mgx(p-q) X Mpg, IGl=p—g. “ 
Then, for B,, := @-'(Lm) if Ln € Ota ony) 

FL Gn, Bn)’ = Opa | (38) 
and because of (36), 

FUE SGLS By 0, (39) 


From (37), (38), and (39), the consistency of B,,, and the asymptotic normality 
of (((m))4? (Q,, — Lp@nLz), we derive 


B,, — B= —(0,F'(f,L,GL5, B))-1 0,F'(f",L,GL’,, B) 
X (Om — LpGnb’,) + op ifi(m))-1) 


= Cl (Qm — LeGnL'g) + op((m(m))-22). (40) 


. 3.4. Asymptotic theory for linear functional relations 355 
ee ee ee i ee ee 


_ By differentiating (39) with respect to B and G we conclude, as in (22) and 

(23), that Cy satisfies (b) of Definition 3.4.1. Hence, consistent MCE of the 
kind described are asymptotic Q,-estimators. It is easy to see that already weak 
consistency is sufficient. 

By this method Robinson (1977a) constructed an MCE for a model with 
dependent errors (see Section 3.5.4.1). This model includes as a special case the 
one (V,), r= p, W,, = R™, m = m considered here. For the special MCE 
derived, a relation (40) is shown; from the above it follows that this is an asymp- 
totic Q,,-estimator. However, in general the estimator is not optimal within 
the class. 

In accordance with Remark 3.4.5 also random contrast functions depending 
on Z,, can be admitted if appropriate conditions are imposed on the conver- 
gence of the function and the relevant derivatives. 


Remark 3.4.6 It turns out that the theory of asymptotic Q,,-estimators is 
analogous to the Gauss-Markov theory in the linear model. Furthermore, in 
view of Example 3.4.12 and [A 2.12], we note a relation to the theory of 
minimal contrast estimation in the case of identically distributed observations. 
Indeed, the MLE is asymptotically optimal within the class of the MCE 
(Michel and Pfanzagl, 1971); the corresponding algebraic problem coincides 
with the present one. 


Remark 3.4.7 Under the specification (V2) the alternatives to Bom, according 
to Examples 3.4.10—3.4.12 are functions of 63,, and thus of Bom (since Qn 
is a function of 6%,,). The preceding results imply in this case that such a two- 
stage method does not yield any improvement of the CIVE. An analogous 
result can be obtained when using the alternative consistent estimators of 
o7 of Section 3.4.2. 

The results obtained suggest further asymptotically efficient estimators in 
addition to the CIVE Bem. Let By, be a consistent estimator of B and 


x 


C,, = Fg. (Ps. + Sx) Fp La... 
As in Lemma 3.4.10 we show 

Crs Cys (41) 
under the corresponding assumptions. Let the estimator B,, be defined by 


Wok | OUROnCn)) if ROnCm) € (My x v-n) 


arbitrary otherwise 
Taking into account 


P(ROnCn) € 0(Mz x(p-a))) i Sag 1; 


23* 


354 Chapter 3. Models with errors-in-variables 


be a continuous function with the property 
VQEN, RQ) = LE RS FEO, 2) IL) (36) 
bn Porally hak Oe ar 


F is called a contrast function. A minimum contrast estimator (MCE) ?,, of £ 
is an estimator which satisfies 


FUP Om, 2m) = iat FUL Oms L): 
LELp, pg 
From the compactness of L, »-, (see [A 3.1]) we infer the existence of an MCE 
with (nonrestricted) values in &,,»-,; in the same way, the (strong) consistency 
can be shown if Q, ———+ G EMP, (analogously to Pfanzagl, 1969; see also 
3.5.4.1, Theorem 3.5.5.) 
In the explicit case we put 


HB) o(B)\ 


Let F be twice continuously differentiable (on an open neighbourhood of 
A x trtaaa)e Let 


F(x, B) := 03 F(z, B) 

0,F'(x, B) := 0,F'(x,B), F(x, B) := 03F'(a, B) 
and suppose 

r[aoF(P,L5GLz, B)| = ap — 49), 

V(BeG) © Nas og) ngs 2 1G) =p — 7q- 
Then, for B,, := @ULm) if 2m € O(Ntyx(p-a)) 

F'(LGm, Bn)’ = Og p-a) ee 
and because of (36), 

P(E, L,GL4, BY = 0p (39) 


From (37), (38), and (39), the consistency of B,,, and the asymptotic normality 
of (((m))? (On — LpGnLz), we derive 


B,, — B= —(2,F'(f,L,GL 5, B))* 0,F'(f,L,GL',, B) 
x (Om — TROaE) + op (i(m)) 2/2) 


= Cel (Qn — LaGnL's) + op((ite(m))*?).. (40) 


3.4. Asymptotic theory for linear functional relations 355 


By differentiating (39) with respect to B and G we conclude, as in (22) and 
(23), that Cy satisfies (b) of Definition 3.4.1. Hence, consistent MCE of the 
kind described are asymptotic Q-estimators. It is easy to see that already weak 
consistency is sufficient. 

By this method Robinson (1977a) constructed an MCE for a model with 
dependent errors (see Section 3.5.4.1). This model includes as a special case the 


one (V,), r= p, @,, = R™, m = mp considered here. For the special MCE 


derived, a relation (40) is shown; from the above it follows that this is an asymp- 
totic @,,-estimator. However, in general the estimator is not optimal within 


the class. 


In accordance with Remark 3.4.5 also random contrast functions depending 


on Z,, can be admitted if appropriate conditions are imposed on the conver- 


gence of the function and the relevant derivatives. 


Remark 3.4.6 It turns out that the theory of asymptotic Q,,-estimators is 


_ analogous to the Gauss-Markov theory in the linear model. Furthermore, in 
_view of Example 3.4.12 and [A 2.12], we note a relation to the theory of 


minimal contrast estimation in the case of identically distributed observations. 
Indeed, the MLE is asymptotically optimal within the class of the MCE 


(Michel and Pfanzagl, 1971); the corresponding algebraic problem coincides 


with the present one. 


Remark 3.4.7 Under the specification (V2) the alternatives to Bc, according 


to Examples 3.4.10—3.4.12 are functions of 6%, and thus of Bom (since On 
_ is a function of 62,,,). The preceding results imply in this case that such a two- 
stage method does not yield any improvement of the CIVE. An analogous 
' result can be obtained when using the alternative consistent estimators of 
| o; of Section 3.4.2. 


| 


The results obtained suggest further asymptotically efficient estimators in 
addition to the CIVE Bem. Let By, be a consistent estimator of B and 


Ca Fg! (Ps. + Sn) Fg La,,.- 


| As in Lemma 3.4.10 we show 


C.. sei Oss (41 ) 


m—->oo 


under the corresponding assumptions. Let the estimator B,, be defined by 


B = o-(R(OmCm)) if R(OmE m) € Oa (pea)) 


arbitrary otherwise 
Taking into account 


Po(ROnCm) € 0(Ms x (p-a))) ERB E 1, 


356 Chapter 3. Models with errors-in-variables 


By — B= L4(Qn — LaGnl'g) En(LQnn)? if 71LQnCnl =p —@ we 
show as in Theorem 3.4.6 that B,, is an asymptotic Q,,-estimator. Theorem 
3.4.9 (see also Theorem 3.4.8) and (41) then imply that B,, is asymptotically 
efficient. 

As the simplest consistent initial estimator we can use Bt = = Baym from 
Example 3.4.10, i.e. the modified 2SLS-estimator. Then we Aner 


KEn) = Fy! (Pas + Sn) Fp OnE pg. (I+ + TRL) 
= Fy (F* + IRS** Ot Ln); 
for the resulting estimator, denoted by Bam, we have 
Bon = o7'(Fp,(I+ + RIOLSS OS L20))) 


if the right-hand side is defined. From [A 1.6] (b) and (c) we obtain, analogously 
to Remark 3.3.10, 


Bam a (Bizm t Borm)s Bins € WMaxcipry2 eae © a tee) 


Bozm = Ly QnSm 'QnLro(L20OmSm 'QnLoo)- (42) 
Bien = Lg ZomN ins (43) 


where Lyo, Lz) are defined according to (xiv). 


Corollary 3.4.3 Under the assumptions of Theorem 3.4.9 the estimator By», 
defined by (42), (43) is asymptotically efficient. 


Under (V,), » = 1, 3 it can be shown that in (42) ‘~’ almost surely can be 
replaced by a since the matrix in question has full rank. Contrary to Be», 
the estimator Bz, is an elementary function of the observations, which does 
not require the solution of an eigenvector problem for its calculation. 


3.4.6 The general nonnormal case 


The asymptotic distributional and optimality statements obtained in the 
Section 3.4.4, 3.4.5 essentially rely on the assumption (N) of a normal distri- 
bution of the observations. Now we investigate what results may be obtained 
in the case of the general distibutional assumption (3.4.1). 

A crucial technical result in Sections 3.4.4, 3.4.5 was the asymptotic nor- 
mality of (%(m))¥? (Qm — LpGinL’z); for this especially the asymptotic nor- 
mality of m-(Oz, .»,, — HoQz,,.w,,) underm! dim WY, ———+ a €[0, 1] had to 
be shown. In the general nonnormal case suitable limit theorems are not avai- 
lable, so we have to restrict ourselves to special cases. These are, firstly, the 
cases (V,), » = 1, 2, « = 1, and secondly the case « = 0. 


3.4. Asymptotic theory for linear functional relations 357 


The case x« = 1 


Here a restriction to (V,), » = 1, 2 is necessary because under (V3) the problem 
of the limit distribution of (m — n(m))¥/? (S®) — H,S®)) arises. Under the 
assumptions (C4) and (C5) the asymptotic normality of m4/?(O,, — LzGnL) 
and hence also of the asymptotic Q,,-estimators (Bein: can be shown. But 
it turns out that DEUBalnsn) depends on the third and fourth moments of 
the underlying error distribution P?. Thus the class of asymptotic Q,,-estima- 
tors is no longer interesting because in general there does not exist an asymp- 
totically efficient element (in the sense of Definition 3.4.2). Therefore we 
restrict ourselves to giving the limit distribution of m1/2(Q,, — LgG,,L;,) and 
mU2(Bom — B) for (V,). The proofs are easily obtained from the central limit 
theorem (Bunke and Bunke, 1986, theorem 2.4.3) and the form of the covariance 
matrix of matrix-valued quadratic forms [A 3.19]. 


Lemma 3.4.16 Under (V,), (C4), (C5), « = 1, we have 
Lim'(On — Gn)} moo Noxp(9pxp, Ae) 


Ay = F, — Ed, + 2G © ©) I, + 2G © G:) + 47, (4 @ 2] T,- 


Theorem 3.4.10 With (V,), (Hx), (C1), (C4), (C5), « = 1, v = 1, 2 for the 
CIVE {3 ial eae we have 


£{mi!?(Bom — B)} aan? N4x(p-a) (04 (0-29 D?({Bom}m=m,)) 


DB.) aa Cro Gel Coe = GG ®&) Col 0,5 
+ G19 © Lt’ OP Cig + G © Lz Se Lt 
with Cos from Lemma 3.4.10. 


On the basis of Lemma 3.4.16, the asymptotic normality of all asymptotic 
Q,,-estimators under (V,) can be shown; this can also be done for (Vg). 


The case « = 0 

In the case of « = Othe termQ,, — LG,L’; is by Lemma 3.4.1 a linear function 
of the observations, up to op(m-/?); from this the asymptotic normality of 
mi2(Q,, — Gm) easily follows. The result with respect to the class of asymptotic 
Qn-estimators then corresponds to the one obtained under normal distribution 
in Section 3.4.5: the difference of two elements is op(m~1/?), i.e. the elements 
are asymptotically equivalent. The meaning of this result will be discussed in 
Section 3.4.7. 


Lemma 3.4.17 Let « = 0. If (V,), (Bx), (C5); or (V2), (Ex), (C1), (C2), 
(C5); or (V3), (Ex), (C3), (C5) are satisfied, then 


Li{m'2(Onm aie LpGnLz)} EES A Nox p(Op xp» A») 
A» — 40°, (LpGL; © 2) DI. 


358 Chapter 3. Models with errors-in-variables 


Proof. First we consider the specification (V,). By Lemma 3.4.1, because of 
EsQm = LpGnL', and Op(m-' tr [Py, }!?) = op(m-?), we have 


m'2(Qm, — LgGmL'z) = mS mPy, Min + MmPw,bm) + or(1). (44) 


Hence it holds for each K € IN, almost surely that 


m 
ml? tr [K(Om — LpGpL'g)] = 2m-12 Ye!” Py, Mi,KP 3S; + op(1), 
i=1 
(45) 
because almost surely €; € J,7 = 1. 
According to Bunke and Bunke, 1986, theorem 2.4.7), (45) is asymptotically 
normal if 


+0 


mM—>Co 


m =) 
(= reset ay max |JJ'KMpPy),e?2 


i 
¢=1 1<ism 


is fulfilled. With 
™ 
Oe we. |J’KM,Pye9”|l? = tr [KP KL;G,,03]| 


= + tr | Key hh, Chee 


m—-co 


and (C5), we obtain the asymptotic normality of (45) for tr [KP;KL,GL,] > 0 
and op(1) for (45) in the case tr [KP;KL;GL‘,] = 0, respectively. As Ay only 
depends on the second moment of P*, A» results from Lemma 3.4.8 for « = 0. 
Let us consider the specification (V.). First we obtain from the statement of 

the theorem for (V,) and from Lemma 3.4.10 as in Lemma 3.4.11 that 
BW, — B = Op(m-4/*) and thence (3.4.16). For R,, according to (3.4.15) we 
infer from Lemma 3.4.1 that 

R,, = Ry, + Op(m-"?), 

Ri, = mE nPytMn + MnPvsbn): 
Then D,R*, = O(m-) follows from (C2) so that Rj, = Op(m-1?), R,, 
= Op(m-1!?). From this, from (3.4.16), and Lemma 3.4.6 (a) it follows that 


© _ 1 .GL', = Q® — L,GML', + 0,(m-"2) (46) 


for » = 2 which proves the assertion. Furthermore 
OP = QP — nlm) m= (SP — E,) 
ee oF on n(m) (m wa n(m))-1 R,, = n(m) m* DpH a, Ls c 


As above we can conclude from (C3) (b) that R,, = Op(m-/2) and from this 
(46) for » = 3, which yields the assertion. 


From this lemma, the next theorem is an immediate result. 


3.4. Asymptotic theory for linear functional relations 359 


Theorem 3.4.11 Let « = 0. If (V,), (Ex), (Cl), (C5); or (V2), (Ex), (C1), 
(C2), (C5); or (V3), (Hx), (C3), (C5) are fulfilled, then for arbitrary meymuptonic 


Qm-estimators {Bn}m>m,> (B* mm, we have 
B,, — B*, = op(m-12), 
EAE parce BY) ee ON cea aes Oe) 
Dig = ove aa 4 


Finally we consider the special case n(m) = n, m = mp, i.e. the case of an 
instrumental variable of fixed dimension (Example 3.4.3). If we assume con- 
dition (C6) here, we immediately obtain 


mL Wy ——> LyT € MP4. (47) 


For this case, Villegas (1966) on the basis of (47) defined a class of estimators 
of B, the so-called ‘ordinary estimators’. This class results from the estimators 
Bom := 0 '(Lpm) in Example 3.4.9 if W>,, is admitted there as depending 
on the observations with Wp, > W €Minxn- The equivalent version 
of (47), 


m*Y,,W., ore BmX,,W,, ats En» E, "o> Onn 


(cf. (xiv)) can be put into correspondence with a linear regression model; the 
class of ‘ordinary estimators’ can be related to the class of linear unbiased 
estimators. The asymptotic normality of the ‘ordinary estimators’ (Bene 
was proved under the general distributional specification, where D5? GB agen) 
depends only on the second moment of P*. An optimization with respect to 
DF ( (BRE? analogous to the Gauss-Markov theorem proves the asymptotic 
efficiency of the 2SLS estimator (more exactly of its system analogue; cf. 
Example 3.4.10). 

A variant of this method for general simultaneous equation models was 
given by Brundy and Jorgenson (1971). 

With the methods of Section 3.4.5 we can easily obtain an extension of this 
result in the following way. On the basis of (47) a class of estimators, say 
asymptotic P,,-estimators (for P, := m—1Z,,W7,), can be defined, in the same 
way as the convergence of Q,, (3. 4, 18) led to the definition of the asymptotic 
Q,,-estimators. Here the ‘ordinary estimators’ are a subclass; if we further 
proceed as in Section 3.4.5, then the result of Villegas (1966) appears as a 
consequence of the corresponding theory. According to Remark 3.4.6 the 
optimality result with respect to the class of asymptotic P,,-estimators also 
corresponds to the Gauss-Markov theorem. Furthermore, it can be shown that 
the asymptotic Q,,-estimators are in this case efficient asymptotic P,,-esti- 
mators. The formal proof is left to the reader. 


360 Chapter 3. Models with errors-in-variables 


3.4.7 Final remarks 


Let us now consider the problem of asymptotic efficiency within the class of 
consistent asymptotically normal estimators (in the sense of [A 2.8]). Let d, 
be the dimension of the parameter space for (P’, B) under (N), (Ex), according 
to Example 3.4.1; then we have 


d=qp—q), @=dp—gqt+1, d=qap—gq)+r7(r + 1)/2. 


The following theorem indicates the lower bound for the covariance matrix 
of asymptotically normal estimators of B for a fixed sequence of parameters 


{Eihien- 


Theorem 3.4.12 Leta model according to Example 3.4.1 be given by {P5 | # € O}; 
let O be defined by OL P® KX Max (pq) X {{Edien} for a given {Eien € RP 
fulfilling (C2), and by assumptions (V,), (N). Then for each estimator {Bm\m>m 
such that 


B,, =e aL aes {Eiiew)> m = Mo 
with 

L{mi2(Bp Ge B)} Foes NG ca is eae D?({Bm}mzm,)) 
one has 


D?({Bm}mzm,) 2 Dooo — (#a,] 
2.e. for almost all (in the sense of a,) parameter values (X;, B), where 
Doge lg GL, SL: 


Proof. It suffices to demonstrate local asymptotic normality of the model for 
fixed {&;};eq (see [A 2.7, 2.8]); here we will omit this proof. A proof for a general 
model of independent not necessarily identically distributed observations can 
be found in Ibragimov and Khasminski (1979, chap. II); compare also Philippou 
and Roussas (1973), Roussas (1972), Andersen (1970b). 


Hence D5>, is a lower bound for known nuisance parameters {&;};-y and thus 
also a bound in the case of (partially) unknown {&;};-, ie. in the present model. 
But this bound is attained by the asymptotic Q,,-estimators, especially the 
CIVE, in special cases only. According to Remark 3.4.3 this is the case if 
« = 0 and 


H =H, (48) 


holds. In particular, (48) is fulfilled in the adequate case; (48) can be considered 
as a condition of ‘asymptotic adequacy’ of a model with (A) in case that (A) 
is not necessarily satisfied by the data. In the case of n(m) = n, m = mp, there 


3.4. Asymptotic theory for linear functional relations 361 


is also an interpretation as an analogue of a condition in the model (LIFU-) 
of Section 3.3.4, 


Xy-w = On <p> 
i.e. to a condition of ‘full correlation’ between the (partially) nonobservable 
random variable ~ and the instrumental variable w. 

If « > 0, this lower bound (under (N)) is not attained by the asymptotic 
Qm-estimators, especially not by Bo, (cf. Theorem 3.4.9, 3.4.5). Hence (for 
the adequate case) the asymptotic efficiency of the MLE with respect to D®, 
is impaired, due to the presence of the unknown incidental parameters N>,, in 
the model, the number (of real components) of which increases indefinitely. 

The general case of a model with an indefinitely increasing number of 
unknown incidental parameters with a structural parameter to be estimated 
(see Remark 3.4.1) was investigated from the point of view of asymptotic 
efficiency of estimators by Neyman and Scott (1948). They discussed the 
possibility of impaired efficiency of the MLE in the sense indicated. The 
possible occurrence of this situation in the model according to Section 3.4.1 
is the motive for considering the class of asymptotic Qn-estimators. The theory 
of this class permits the asymptotic efficiency of the MLE (or more generally of 
the CIVE) to be established with respect to a number of alternatives. 

For the case of a finite number of parameters (Example 3.4.3 under (A)) 
these alternatives to the MLE are asymptotically equivalent; this corresponds 
to the situation in simultaneous equations models (see Theil, 1971; Schdnfeld, 
1971). Moreover, for this case the standard statements about asymptotic 
efficiency of the MLE (cf. [A 2.9]) are valid. The case « = 0 can be regarded 
as a case of an ‘essentially finite’ number of parameters; the efficiency of the 
MLE is then essentially, i.e. asymptotically, unimpaired (Neyman and Scott, 
1948)). 

The result on the optimality of the CIVE also in the case « > 0 allows an 
interpretation in the sense of robustness or ‘asymptotic efficiency of higher 
order’. According to the explanations of Example 3.4.7 the asymptotic effi- 
ciency in the case of « > 0 can be regarded heuristically as being nearer to 
optimality for finite samples (with e.g. small allocation number per group in 
Example 3.4.7) than the asymptotic efficiency in the case « = 0. 

Concerning the comparison of the competing estimators, i.e. of the CIVE 
Bom and the modified 2SLS-estimator Bs, partially including Boss (for 
r = 2, Example 3.4.10) and Bo, (Example 3.4.11), similar results have been 
obtained in the literature. To compare Bom and Bas Anderson (1976) used 
asymptotic expansions for fixed m and certain diverging parameter values 
(see Section 3.5.2). Tukey (1951), Madansky (1959), Dorff and Gurland (1961), 
and Robertson (1974) formally calculated asymptotic variances. Moreover, the 
conclusion of Fuller (1977) on the basis of the asymptotic MSE of modified 
MLE and 2SLS-estimators (Section 3.5.3, (3.5.60)) agrees with the one obtained 
here, i.e. with the efficiency of the MLE: 


362 Chapter 3. Models with errors-in-variables 


Anderson (1951b) calculated under (V3), (N), (A), n(m) = n (i.e. in the 
model of Example 3.4.3) the limit distribution of a parametrization of ?om 
as a matrix of eigenvectors. Malinvaud (1966) obtained a similar result for 
(V,), (N), « = 1 and from this he calculated Dg} for p = r = 2, q = 1. Schnee- 
weiss (1976) calculated the normal limit distribution of Bay under a general 
error distribution in the case of the specification (V,). Patefield (1977, 1978) 
found= Dy for (V,) 0H ly p= 3,4 = 2, gai ee ior (Vj -on ke 
r=p,q=1. Van Houwelingen and Schipper (1980) obtained a result on the 
accuracy of the normal approximation for the distribution of a parametrization 
of Lom. The calculation of Do has been performed by several authors (see - 
Section 3.5.1). From Sections 3.3.4 and 3.4.6 one recognizes that for IV with 
fixed dimension the distinction between random and nonrandom y;, Wj, 
t = 1,..., m is not essential for the asymptotic theory. Hence the results on 
the asymptotic theory in models (LIFU-) of Section 3.3.4 can be carried over 
to the case of nonrandom p;, 7 € N. 

The asymptotically efficient estimators Bz, given in Section 3.4.4 can be 
seen as resulting from an iterative improvement procedure specifically adapted 
to the present model. The original method for models of independent, iden- 
tically distributed observations, the heuristic background of which is the 
approximation of solutions of the likelihood equation (LeCam, 1956), can not 
be applied here because of the presence of incidental parameters that are not 
consistently estimable. Some authors discuss inference on the basis of a modi- 
fied likelihood function, obtained by suitable elimination of incidental para- 
meters (Kalbfleisch and Sprott, 1970; Sprent, 1976; Klebanov and Melamed, 
1978; Patefield, 1978). The estimator Bz,, can be interpreted as resulting from 
a improvement procedure constructed on the basis of a modified likelihood 
equation. 

The comparisons of efficiency presented here always referred to estimators 
defined by means of the same instrumental variable. Comparisons of different 
instrumental variables as well as special methods of improvement and com- 
bination (Ware, 1972; Feldstein, 1974; Schneeweiss, 1975) remain outside the 
scope of this treatise. 


3.5 Special asymptotics 


In this section we describe some important results, which could not directly 
be included in the relatively closed theory for LIFU* with independent errors 
given in Section 3.4. Especially there are the information matrices for MLE for 
nonlinear models with replicated observations. Furthermore, some of the 
estimators already introduced in Section 3.3 are considered. We-also report 
results concerning the behaviour of estimators for finite sample size. 

In view of the large number of available results it will only be possible to 
provide a survey of the asymptotic statements in this section, often without 


\ 
3.5. Special asymptotics 363 


detailed proofs. Thereby we try to present the results in such a way that in a 
practical application of the described estimation methods the accuracy of the 
obtained estimations can be computed approximately. Moreover, some of the 
approximations described in the following do sometimes admit statements for 
finite sample sizes in practical use as well as the construction of confidence 
intervals and tests. In the present section the following problems will be dis- 
cussed especially for eigenvalue estimators in LIFUt: 


1. The distribution theory for finite samples by applying infinite-series ex- 
pansions. 

2. The approximation of these distributions and their parameters for in- 
creasing sample size or related models with changes of other model para- 
meters. ; 

3. Statements on accuracy by comparing different measures of concentration 
and their approximations. 


Because of the difficulties with multivariate models we have almost exclusively 
only been able to present results for the case d, = 1. The accuracy of the 
approximations can be compared either by comparison with the exact dis- 
tributions or with sufficiently exact simulations. Often these comparisons 
themselves have been obtained by simulations. The corresponding results are 
reported. With respect to the intention of the chapter the exact distribution 
theory will not be taken into account. 


3.5.1 Asymptoties under fixed experimental design 


3.6.1.1 The model 


In certain cases it is possible to develop a series of measurements in such a way 
that we have repeated observations over a fixed experimental design. The 
MLE (cf. Section 3.2.6—3.2.7) provides consistent and as we will see asympto- 
tically optimal estimators of the system parameter and of the experimental 
design. The corresponding information matrix and its approximation from 
the sample give an approximation of the accuracy of the estimation. We 
consider explicit models with replicated measurements according to (3.1.49) 
but without restrictions of the form 0 = p(z), as they were still permitted in 
Definition 3.1.3. For simplicity we suppose the same number of replications 
over the single design points and we also suppose normally distributed errors. 
Hence, let 


HS yf x ea eee tS ae Race et 
&ij Mi + Si, y ) (1) 


Hi = T(E, m) = (W;3 €i; ™) 


wf 


364 Chapter 3. Models with errors-in-variables 


The basic formulas are derived for a general covariance matrix 2 € Wz. 
Specializations are carried out for 


2 = Diag Tn, © i) (2) 
and 
Qa Ores 


The case 2 = A @ ~ can be treated expecially for LIFU* as simply as it 
results from the corresponding remarks in Section 3.2.5. In nonlinear models 
this case can also be treated simply. Because the case of a known matrix A is 
practically of no importance it is left out here. Formulas concerning this are to 
be found for variable-classified data Z = [Z(),,..., Z(?,] and hence in a 


slightly different form in Dolby and Freeman (1975) for d, = 3. 


3.5.1.2 Asymptotics for maximum likelihood estimators 


Under sufficiently general assumptions the consistency of the MLE already 
results from the fundamental paper of Wald (1949). The numerous possible 
generalizations will not be considered in detail. We recall only the generali- 
zations to minimum constrast estimation (cf. Strasser, 1973) and to models 
with unequal numbers of replications over the single design points. 

In the proof of consistency by Wald some assumptions are made. Besides 
some assumptions on the distribution of the errors for the present case, only 
one single assumption on the inner structure of the model is applied (Wald, 
1949, p. 596, assumption 4), namely, for different parameters the related di- 
stribution functions of the observations shall not be identical. But this is 
fulfilled for all models satisfying the assumptions of the identifiability theorem 
mentioned in Section 3.1.4. If necessary the parameter region has to be slightly 
restricted, as was explained following the identifiability theorem in Section 
3.1.4. 


Theorem 3.5.1 (Consistency theorem for MLE in models with fixed experimental 
design) In models with an equal number of observations on all design points the 
MLE is consistent if the conditions of the identifiability theorem are fulfilled and 
the distribution of the observations satisfies certain weak assumptions. 


Theorem 3.5.2 (Optimality theorem) If the 2n);,7 = 1, 2, ... are stochastically 
independent random variables with the same density p,(z) and if this density 
fulfils certain simple regularity assumptions for the true parameter wo, then each 
consistent MLE is optimal under all asymptotically normal estimators in the 
sense of minimizaton of the asymptotic covariance. With the notation 


2m) >= (%nyj)j=1,...mo? mM = Mn, 


3.5. Special asymptotics “365 


the convergence 
LY m0(¥(2m)) — Yo) > N(0, Inf (yo)-1) (3) 
as valid for almost all wo € Y if Inf (yo) is the information matrix. 


A proof of the theorem where the regularity conditions are given in detail 
is to be found in Nélle and Witting, 1970, theorem 2.32), for example. 


3.5.1.3 The information matrix under a normal distribution 


Let (aN = N (0, D,(é)). The likelihood function for this case was given in 
Section 3.2.2. With the natural partition of the whole information matrix Inf 
into the blocks Inf, Inf;,, ..., Inf,, we get , 


Inf, = E, (041 dq) 
= 0,p QB, (SS) Qu 


eA (OP Cah) Cas) (Os. xan | Ont il)is (4) 
Infxe = (0| Gyr)" De(5)-4 Diag; (Za | ri); (5) 
Inf; = Diag; (I dzri)) D,($)"?, Diag; (Z| Gzr;]). (6) 


In the blocks belonging to y, we always have Inf, = 0 (0 = [&, 2]). Under the 
usual regularity assumptions on /, this results from 


E(6,1 yl) = —E(é,(2,1)) (7) 
(cf. Zacks, 1971, lemma 4.3.1) and from 
Ayal) = Goon) OnaD(S)$, oe = [6,7] (8) 


and because of E(¢) = 0. 

Now we want to give the block Inf, only for the case D(C) =I ® 2,y = x. 
But this is a standard problem from multivariate analysis for parameter 
estimation of a multivariate normal distribution (cf. Anderson, 1958): 


otugst + ott ost; r =e 8, tru 
Inf = 3 ottos; (f= Ub SS Ue (9) 
(ot)2/2; pom 8, ti W, 


if 3-1 = (a). 


3.5.1.4 Asymptotics for weighted least squares estimators 


Assume the model with the block diagonal covariance structure (2). That is the 
reason why we do not take the class of all WLSE as a basis in the following but 
we consider a slightly restricted class with similar block diagonal weighting 


366 Chapter 3. Models with errors-in-variables 


matrices. If we have more general assumptions on the distribution of errors 
we will correspondingly have to permit more general weighting matrices. On 
account of Section 3.2.2 we obtain MLE with known covariance. The weighting 
matrices we are going to consider now shall be of the form: 


W = Diag ((In, © Wi)), WE MZ: (10) 


The parameters of the model are y = [&(n), 2, 21, ...,£,]. But for the calcu- 
lation of the WLSE we only need the part 


9 := [Ema]. (11) 


With that we obtain as the criterion for the estimator of ? the estimation 
functional 


18) = m3 ¥ lle — wilh,» m= Dm. (12) 
OW) i 
Elementary transformations yield the equivalent estimation functional 


1,(8) = m=? Xm |i; — milty, (13) 


The main simplification in this model with fixed experimental design is that 
we have in fact got a regression model, which becomes obvious from the func- 
tional dependence of the parameters yw; on #: 


k= Mild), o = (e(9)> a], (14) 
2 = w(P) + Gj. (15) 


With this the functions yu; totally correspond to the regression functions in 
usual regression models and we can apply all results known from regression 
models here. 

By means of the identifiability theorems it turns out that we have identi- 
fiability of the system parameter in the models covered by the contact condi- 
tions (cf. Theorem 3.1.4 in Section 3.1.4). Under certain regularity assumptions 
the consistency and asymptotic normality of the WLSE follow. Now we apply 
to the special regression model the results from Chapter 1, from which we do 
not need the more complicated extensions for infinite experimental designs, 
but the extensions for multivariate regressands. However, the corresponding 
modification of the results from Chapter 1 is a purely formal matter. 

In order to secure the possibility to transfer the mentioned results we have 
to check the regularity assumptions on the differentiability and smoothness 
of the function 


wi = wil) = [Ei Tin £2) (16) 


3.5. Special asymptotics 367 


which have mostly to be considered to be satisfied in practice. Necessary for 
the proof of consistency is the compactness of the parameter set O for #. In 
practical applications this mostly results from physical or other natural 
bounds for the region where the state variables and structural parameters may 
vary. In the present special case we thus have finally to check the identifia- 
bility assumption which was formulated for more general models. For this 
purpose we assume that the relative frequencies of the numbers of observations 
over the single design points are positive: 


m—>oo 


This is a sensible description of non-negligible observation frequencies over 
each design point. The identifiability assumption by Jennrich (cf. Section 
1.1.6.2, condition A4) now demands that 


Di \lui — Halli, = 0 (18) 
i=1 
yields 
o = Bo - (19) 
But, for h; > 0, 


eae ano Wee nO) 


results from (18). 
Because of the special form of the m4; in explicit models, 


§;= fn; i= 1,...,n, (21) 
which yields the equations 
7i(Eio, %) = TilEio, No), ¢=1,...,n-. (22) 


Now we can apply the identifiability theorem from Section 3.1.4. According 
to this x = mp for n > dim JJ under the practically mostly fulfilled conditions 
on the system equations. If necessary one has to add an inessential restriction 
of the parameter region y, as was explained in Section 3.1.4 following the 
identifiability theorem (Theorem 1.1.5 with n replaced by m = Xm;). 


We will immediately formulate the results of Chapter 1 in the needed multi- 
variate form. 


Theorem 3.5.3 (Theorem on the asymptotic normality of WLSE in models with 
fixed experimental design) For WLSE with weighting matrix Q = Diag (I ia 
® W;)) we have 


£{/m(S — 9) > N(0, B(9o, W)), (23) 


368 Chapter 3. Models with errors-in-variables 


where 39 = [E(nyo» Xo] ts the true parameter and the covariance of the limit distri- 
bution is obtained from the following relations: 


D(H) = 3(%, W) 
= 0 (Diag ((h:W)))-# © (Diag ((h; - Wi - Xn - Wi))) 


x C (Diag (2; - Wi))) (24) 
with 
C(W) := O5l(nyo -W- O64 n)0 > (25) 


where Oopin) has a special form corresponding to our special model: 


Bolt(myo = (Gg(nylo | Onfto) (26) 
gtx (F10, Mo) , | 0 Only (E 105 To) 

heen Ome re a @ 
0 | 20 ! Oztn(Eno, Xo) Onbn(Eno» Xo) 

Oxi = [Lag | Osri], Anti = [Od xa, | Oni) (28) 


3.5.1.5 Asymptotic optimality of weighted least squares estimators 


Possibly the practically most important statement with regard to the mini- 
mization of the limit covariance matrix is 


‘GLSE are the best WLSE’. 
This limit covariance matrix attains for GLSE with w; = 2;)' as minimal value 
°D(Ferse) = O(Diag (Xjp' - hi) = O(Diag (Zio/h;)). (29) 


This optimality follows from a computation of the lower bound as in Theorem 
1.1.6. But the covariances 2’) are mostly unknown in applications. With repli- 
cated observations they can be consistently estimated. The optimality state- 
ment remains valid for each consistent estimator of 2. Especially WLSE with 
the following weightings are asymptotically optimal: 
mM; 
Wet =D (iy — %.) (%yj — 2.)'/mi. : (31) 


py 


WLSE with such a weighting are two-stage estimators. 


3.5. Special asymptotics 369 
eee ere nh A ee ee ee 
For stationary errors we know an analogous statement (Hannan, 1971; 
Robinson, 1972). For normally distributed errors the GLSE is also asymptoti- 
cally optimal under various other criteria; details are to be found in Chapter 1. 
The optimality of the GLSE under normal distribution with known covariance 
is obvious from the statements for the MLE made in the present section. 
In case the number of replications in each design is mp, it also follows for 
My —> co that 


£{V mo — d)} > N(0, -D(d)), (32) 


where °D() has essentially the same form as in the general case, except that 
the weightings h; have to be omitted. 


3.5.1.6 Asymptotic covariance of the estimation of the structural parameters 


In most practical applications it is useful to have the asymptotic covariance 
of the estimator # of the structural parameters separately. The accuracy of 
the estimated design points fi; is not so interesting since the uw; often only play 
the role of basis-points in a ‘gauge-experiment’ for the determination of the 
structural parameters of actual interest. 

We only consider GLSE. The formulas for the practically more important 
two-stage GLSE are the same. The asymptotic covariance of # we obtain 
by means of the known inversion formula for block matrices: 


3, = @D(#) = (Dz — DuzDz Dix), (33) 


where the D;, D,, D;:, arise from the natural partition of ©D/(5) in (24), where 
the weighting matrices leading to GLSE are inserted. With elementary trans- 
formations we obtain by using (27), (28), (29) 


Set = LY hOimnrdig’?(I — Pe,) Zig’? O.mio- (34) 
i 

Py, is the orthogonal projection on the linear subspace 

L£, = RX” Oeuio); (35) 
where guin = [La, | Oerio]- 
But now 

if —— Py», = Py2, tee = Rx — Grr io Lae); (36) 
which finally yields 

psa Dy Rene fa ((—@erio I) Ziol —2:710 | J\\* Ox1 0 « (37) 


i=1 
This is a natural generalization of some known formulas that have been derived 
for functional relations, i.e. in case 7; = 79. The case d, = 2 is treated in Dolby 


94 Nonlinear Regression 


370 Chapter 3. Models with errors-in-variables 


and Lipton (1972), and the case m; = mo, Yjg = Xo in Dolby and Freeman 
(1975) for d, = 3. The results derived there only resulted from the formal in- 
vestigation of MLE and the information matrix without asymptotic distri- 
bution and optimality statements. 

Finally we apply the results to the simple bivariate linear models. From (37) 
we can obtain a bound of the covariance of 6 = [4, 6] by computing the asymp- 
totic covariance. The inverse of the information matrix converges for replicated 
observations of a fixed experimental design to the asymptotic covariance of 
\/n (# — 29). (This will also be valid for all sequences of experimental designs 
the number of design point of which increases ‘slowly’ relative to the numbers 
of replications). 

Because of Q4,)(% + B&F) = (1, &:), Or; = B, Oc5 = 0 (37) yields (cf. w(f) 
in Equation 3.2.1.1 (23) and 3.8.1 (4)) 


Me 
Fen = E90 (2 fa) 28) = as + 0s) 


We obtain a consistent estimator by replacing é; and s,(8) by &; and si(b); 
respectively. The covariance estimator itself we get by inverting and dividing 
by n (see also Barnett, 1970, for o, = cos, c known): 


_ ate) 
18g a bs aos Nie a (38) 
m 


Take into consideration that this formula is formally also valid for the case 
of nonreplicated observations (Barnett, 1970), but that it does not provide the 
true asymptotic covariance, which Patefield (1977) gave under the conditions 


lim € < 00, sion lim Dijin coz 


n—>oo N—>0o 


(see also Section 3.4) and o;¢ = co; (c known): 


Deion e clear) 
(Co Bae Bee ie 
: oye Uae) n(1 + 7) 
D°(&, P) = ae: —_ (39) 
n y Co Ei 


—1 
with 
T = Cos/(c + B?) o; 


3.5. Special asymptotics 371 


A consistent estimator D of n - D™ is 

(c + B) of ( m(1 +t) + dzy/(np) —#.(1 + 2) (40 
day —#(1 +4) 1k x ) 
with 


t=n-c-oB/(c + 82) diye 


If o5 is unknown, we can use the consistent estimator 2n63(n — 2), where 6; 
is the MLE from this model (Kendall and Stuart, 1961, (29 56)): 


65 ore (d, ae 2Bday a Bd,,)/2(c ate B?). 


3.5.2 Comparison of MLE and two-stage LSE in linear functional 
relations with nonrandom unobservable variables 


We consider the model denoted in (3.1.38) as LIFU* with linear regression 
part, which turned out to be the reduced form of a linear simultaneous equa- 
tions model. Thereby wesuppose normally distributed errors with the covariance 
I,, ® & in the distribution model, where 2 is unknown: The corresponding 
distribution approximations were computed by Anderson and Sawa (1973) for 
the 2SLS estimation, and by Anderson (1974) for the MLE. The probabilities 
of falling into certain intervals computed on this basis by Anderson (1974) 
permit an asymptotic comparison of accuracy, where the calculation is based 
on an experimental design which is spreading. The results for known 2 are 
compiled in Anderson (1976). 

The following comparisons of accuracy for p = 2 have fundamental cha- 
racter for general multivariate LIFU*, because there is no doubt that they 
can also be projected onto the multivariate case with their qualitative state- 
ments for finite sample sizes. This is especially important, because corres- 
ponding distribution approximations for p > 2 demand an excessively greater 
technical expense. 

Comparing the accuracy, there arise some problems, which we will explain 
briefly (Anderson, 1976, p. 8). Taking the MSE criterion, the 2SLS-estimator 
has to be preferred to the MLE, since moments to the order m — 2 of the 
2SLS-estimator are finite and all higher moments are infinite (Mariano, 1972). 
With this, the 2SLS-estimator has finite variance for m = 2, but the MLE 
does not. However, the probability that the MLE will fall into an interval 
around the true parameter may be greater than that of the 2SLS-estimator, 
namely for most of the practically interesting intervals. 

To describe the approximation of these probabilities we need the parameters 


SO om ESy,vé', Ces (0, 1) 2[—8B, 1], (41) 


24* 


372 Chapter 3. Models with errors-in-variables 


r= SVY_B, 1] a(—B, 1) S¥2/det [2] = xé,/det [I], (42) 
x := (Boz — o45)*/det [2]. (43) 


The parameter t is a measure for the spatial dispersion of the experimental 
design in proportion to the dispersion of the error. The approximations are 
carried out up to the order O(r~?). Thereby the following assumptions are 
included: 


1; We [OV Mees N= Ny + Ng; 
2. [M,J=1 and &,) +0; 
3. 2/(m — n) os is a bounded sequence (Anderson, 1977, p. 512). 
Then (Anderson, 1974, 4.27), 
Py := P(\8y — | S 26,/z) = Oe) — O(—2) 
+ 1-1(—(n, — 1) a + (2% — 1) x8 — x25) D(x) + O(r-?). (44) 


It is obvious that for x = 0 also Py. = 0 + O(r-?), and £ is the median of Bu 
up to the order O(t~?). 
For the 2SLS-estimator the probabilities (Anderson, 1974, (5.4) are 


Prs := P(|xs — B| S 26,/ x) = G(x) — O(—2) 
+ HH(n, — 1) — (n, — 1)? x) x 


+ (2n,x — 1) x — xx*} O(x) + O(r-*). (45) 
This yields as the difference of the two probabilities: 
A= Py — Pos 
= P(x) (n, — 1) x(((my =i? — 2ncc*) |x) + O(r-2). (46) 
For ; 
% S 2(n, — 1) (47) 


we have 4 < 0 for all x, and ys is better. If (47) does not hold, then 4 < 0 
only if 


x < ((m, — 1)/2 — 1x)? < ((m, — 19/2)”. (48) 
This yields the following result. 
Theorem 3.5.4 Under normal distribution the 2S LS8-estimator for x << 2/(n, — 1) 


ts uniformly better than the MLE, and otherwise over intervals (—2x, x) with 
ax <((n, — 1)/2)"?, wp to remainders of the order O(t-®). 
) 


A corresponding statement follows from the comparison of the MSE by 
asymptotic expansions approximating the true distribution (Anderson, 1974, 


3.5. Special asy mptotics : 373 


p- 572). The distribution for By for a known 2 was given by Mariano (1973). 
The asymptotic expansions are the same up to the order O(r-*/?) under a 
certain assumption. With unknown Z, the Wishart-distributed estimator 
Szw with m —n degrees of freedom is used instead of the known matrix Z. 
According to Anderson (1977) this yields under the assumption 3, the stated 
equality of the asymptotic expansions. 

Approximations of the distribution of the estimator 6 = tan-1 8 of the 
angle # = tan"! 6 were developed in the same way; for Y = ol the distri- 
bution of by does not depend on # (Anderson, 1976, p. 4). The same is prob- 
ably also true in genera) multivariate LIFU* for the Eulerian angles of the 
subspace = S,. 


For LIFU* without regression part, Potefield (1976) investigated the va- 
lidity of the approximations for the resulting #-estimators corresponding to 
(44) and (45) in a simulation study. It turned out that the approximation is 
approximately valid in the region m < (2/3) (r + thes where r has the follow- 
ing form under this distribution model: 


t = (1 + fF) Seo. (49) 


The asymptotic expansions considered by Anderson (1976) were based on 
fixed m, B and t => oo. Another asymptotic expansion is obtained for t —~ 00 
with fixed t/m and f. Then (Patefield, 1976, p. 46), 


by — 0) ~ N(0,1 + (m — 1)/z). (50) 


results. It seems that this approximation is true in a somewhat greater region 
of rt and m (Patefield, 1976, p. 56). 

It is much more difficult to get approximations of the distribution for p > 2. 
First, in Sugiura (1976, theorem 1.5) we can find such a distribution of the 
smallest eigenvector for the case gy = 1, where the normalization used there 
corresponds with that of Theorem 3.2.9. But one has to make some formal 
adaptations in order to be in the position to transfer the results to the MLE 
or even to consider general eigenvalue estimators. For q > 1 the joint distri- 
bution of several eigenvectors has to be approximated. 

Statements on the relations between the asymptotic expansions explained 
here and the approach with small errors, ie. ¢ +0, which was studied by 
Kadane (1970, 1971), are to be found in Anderson (1977). 


3.5.3 Comparison of modified MLE and two-stage LSE 
in linear functional relations 
with nonrandom unobservable variables 


In the following we will consider the same models as in Section 3.5.2, ie. 
LIFU* with linear regression part. Now the models with p > 2 re included, 
while the assumption g = 1 is kept. We represent the results on the approxi- 


374 Chapter 3. Models with errors-in-variables 


mate comparison of accuracy by Fuller (1977) for the modified MLE and 2SLS- 
estimators developed there. The modified MLE and 2SLS-estimators were 
introduced in (3.3.28) and (3.3.29), respectively. Now we suppose the sequen- 
tial model to be defined by a series of matrices W(,), m = 1, 2,..., where 
only n, is fixed and n, may increase: 


1(Wm)) =n=m(m) +n, m= 1,2,... (51) 
For the series of parameters M, = M,(n,) = [, 7], € € Mip-g)xn, We assume 
lim S3/n; = D; € M>_y. (52) 


To simplify the derivation we suppose as usual (3.2.81), 


OY’ == or Py, = Py + Py: (53) 
as well as 
UU jit. (54) 


which is achieved by the transformation following (3.2.81) 
& = ((00")" Ym) 0, 
(55) 
M, := M00’ y2/Vm. 


Thus Fuller (1977) was able to show that the kth moments (k = 1, 2,...) of 
the modified estimators are bounded for all m > m(k) and that for the esti- 
mation of the parameter b € IR?-1: 


(« — 1) 
m 


E(Bypg — 6) = Sz'(Zse} Xs) (1, —b] + O(m-*), (56) 


E(bym — 6) = ee S7"(2Zse | 25) [1, —b] + O(n’). (57) 


Here S; * — 0,(n;"). With 
Zz = (—0',1)2[—6, 1], 2 es 2 (—8’, 1) [251 Ze] 
we get with MSE (b.) := H(b — 6) (b. — 6)’ if Eb. = b. 
MSE(byu) = m-1Sz' 2; + m-2Sz' LZ; tr (Sz*Es) 
— m-?2(% — 1) Sp LzSz' Zse 
+ m-*(n, — (p — 1) — 2(« — 1)) S712,S712; 
+ m-*((2 — «)® — m, — p + 1) S7125,2 5877 
+ O(m-8). (58) 


3.5. Special asymptotics 375 


If the transformations resulting in (53)—(55) are taken into account, we have 
to write this equation with the variables of the original model. However, one 
also has to transform the the estimator. As one may conjecture from the 
equivariance of WLSE (cf. Section 3.2.3.3), the 2SLS and the LIML will be 
also equivariant. Therefore we may assume that both sides of (58) then underly 
the same quadratic transformation corresponding to U C20: 
Furthermore, 


MSE(by2s) — MSE(6ym) 
— 2m-*(n, =p, +L 1) S712 52'¢587* + O(m-) ‘ (59) 


Moreover, for « = 4 we obtain estimators whose MSE is uniformly smaller up 
to the order O(m-*) than for each smaller «, since the MSE depends on « by the 
term 


—m~*2aSe*(LseS7*Lz5L aims = 272355") 
-m-*(a? — 4x) Sz1Zy¢Z 7557? + O(m-8). (60) 
Because of 


SPL S12, — Sz1Z5,Zz9S 
and 
SPL zgS7 Leg — Sz! X52 XegS7" 


are positive definite, we obtain the statement. 

These results explain the somewhat poor accuracy of the MLE compared 
with the 2SLS-estimator in several Monte Carlo studies. If we once neglect 
the modifications to generate finite moments, then the 2SLS-estimator is a 
k-class estimator with « =n, —p-+q, while the MLE for « = 0 results. 
Thus we can expect a greater accurcay of the 2SLS-estimator for LIFUt 
with linear regression part in general. Fos the same value of « one will have to 
prefer the modified MLE because of (60). 


3.5.4 Asymptotics under dependent errorsin linear functional relations 
3.5.4.1 Minimum contrast estimation 


The minimum contrast estimator developed by Robinson (1977) shall be 
studied in this section (cf. Section 3.3.2). Only the consistency of the estimation 
is proved in detail, while the asymptotic normality and further aspects will 
not be inquired into. As an immediate consequence of a law of large numbers 


376 Chapter 3. Models with errors-in-variables 


there also results under very general assumptions the inconsistency of the 
OLSE if we do not have a regression model. 
Provided that the assumptions A1, A2 from Section 3.3.2 are satisfied, 


Se foe Ty + 3s, | ToB’ 
lim Sz[n = |---|") = (-------|--~------ = S(y) (61) 
Mares) BT, | BT,B' + 5, 


y 


is true almost surely. As a typical example we consider 
m(Sx — 2,) = (Sz— DY Ti) + Sep + Sez + (Sz — %5,) 
t=1 


(Lies Pe): (62) 


Th. 


i 


By (3.3.47) the last term converges to zero. 

Assumption Al also yields that the other terms are sequences of so-called 
martingale-differences which have uniformly bounded (c/2)-moments because 
of A2. For example, 


E(§ 6; | B;) = E(§,E(d; | Bs,)| Bi) = 0 (63) 
and 
Bd §;\\"!? S (Ello ° E\\§ I°)/?. (64) 


Consequently, the terms in (62) are of the order 
O,(n@lO-1 (log n)**l¢(log log n)2/*) 


(Robinson, 1977a, theorem 1). Under the assumptions A3 and 24 + 0 we get 
for the OLSE: 


SxySz Gr BT (To + Sj)t+B. (65) 
Conclusion. Under very general assumptions the OLSE is consistent if and only 
if the &; are regressors in a multivariate linear regression model. This holds both 
for random and nonrandom experimental design. 

Under a uniqueness assumption we can show the consistency. 
Assumption A5 Let Ay = A(y), Qo = Q(yo), b(yo) = ho. Then the equations 

A(y)= Ao, By) = 2, hy) = hg (66) 


only possess the unique solution y = yo over yp € VY. 


3.5. Special asymptotics 377 
Theorem 3.5.5 Under Al to AS } = (zy) ts consistent. | 
Proof. Because of A1 it uniformly holds for all y almost surely, that 

L(y) > Up) := —log det [2] — tr [2-7 (y)] (67) 
(cf. (3.3.31) —(3.3.41)). Then 


L(y) — Uy) = h(y) + bly), dart aaregt) 
L(y) = —log det [2,Q(y)] + tr (QAy)-1) —(p — 9), (69) 
lo(y) = tr ((4o — A(y)) 2.(49 — A(y))’ Qy)). (70) 


From the representation by the eigenvalues and the inequality log w < x — 1 
it follows that 1,(y) = 0. The equality is equivalent to Qy) = Qy. Likewise 
1,(y) 2 Ois true with the equality if and only if A(w) = Ao, because X, > 0 
and 2(y)-1 > 0. 

Now the usual conclusion principle for minimum-contrast estimators will be 
applied. Namely, by V5, U(yo) > Uy) is valid if y + yo. If there was only one 
subsequence {pm} & {yn} converging against ~ = yo, then 


0 SLPim) — Upo) ar UP) — Upo) < 0 


would have to hold, which would be a contradiction. 


3.9.4.2  Identifiabslity 


With given values Ay Ons lg the equations (66) can have arbitrarily many 
solutions. But we can always find a sufficiently small neighbourhood of yo 
that the solution of (66) is unique. This follows from the implicit-function 
theorem provided that the necessary condition on the constant rank of the 
Jacobian matrix is satisfied. This procedure can provide only a local identi- 
fiability, in the sense of the local character of the theorem on implicit functions. 
In practical applications of the described estimation method we will have to 
rely on the fact that we can revert to a reasonable small neighbourhood of the 
parameter, in which the identifiability is also secured. 

Finally, we want to mention the identifiability condition for a case that 
often arises in applications. For this purpose let the matrix 7 be regular, 
which was not demanded until now. Furthermore, let us assume that no de- 
pendences exist among the elements B, 2%, 2, in h. For the local identifiability 
it is necessary and sufficient that the null spaces of Ogh(Bo) I, ® Ty ', eXsh(2s,) 
and é@;,h(X,,) By © Bo have a nonvoid intersection. With this we have at 
least one criterion that provides — even if not the global — but at least a 
local identifiability. 


378 Chapter 3. Models with errors-in-variables 


3.5.5 Nonlinear models with increasing experimental design 


For the Gauss-Newton iteration given in Section 3.3 that approximates the 
WLSE, the asymptotic normality can be shown under weaker assumptions 
for the distribution of the errors. The following assumptions are used: 


V1. The functions 7; have continuous and uniformly bounded first and second 
derivatives. 
V2. Let 


ly — To = Op(m-), m= mbm (71) 
hold for the initial estimator 2. 


Furthermore we assume that we have the model with replicated observa- 
tions and as initial estimator of é;) we choose: 


Sia := %.. 
The advantage of using %;. consists in its simplicity and in the simplifica- 
tion of the proof in Theorem 3.5.6. 
V3. Let the $j) be mutually independent and let 


Boin=9, Doin = X= 2h. (72) 


V4. 2, is known and regular. 
V5. lim I,,2,, = © € Mp, Le. 2, = O(i5}). 


m—>oo 


V6. 2,/m is regular for m > p and 


lim 2)o/m =: 2, € Me. (73) 


—>oo 


V7. For a x > 0 it holds for all 2, m that 
E |S imo? ** SK < 00. (74) 
V8. Let » = o(1), ie. I-? = o(m-"?2), 


The last condition may be characterized as ‘weakly increasing size of the ex- 
perimental design’ compared with the number of replications. The acquisition 
of the initial iteration 2,, the treatment of unknown covariances of the ¢; and 
the importance of assumption 7 will be considered at the end of this section. 
The asymptotic normality is proved in several steps, in which it is shown that 
the iterations for the parameter, like &,, 2,,, etc., converge with a certain 
order against the true parameters &j9, 2'x9 etc., and that this also holds for zy. 


3.5. Special asymptotics 379 


Following that, the limit theorem of Ljapunow can be applied. The proof results 
by means of an idea investigated by Fuller and Wolter (1982) for the case g = 1 
and by various modifications also for more general models, especially for 
gi> 1. 


Theorem 3.5.6 §;, — 9 = Op(I-1/). 
Proof. For arbitrary a > 0 we can choose a 6; > 0 with 
Ds} ) [bi < a. (75) 
From Tschebycheff’s inequality we obtain 
PST Sp] > p 2? - by) (76) 
< D(S{)/(bj- p- pt) <a. (77) 


Thus the theorem follows from assumption 1 by the definition of order for 
random vectors. 


Theorem 3.5.7 Extending Theorem 3.5.6 we obtain 


Gi, — fig = 5, + Op(m-¥2) (78) 
with 

J, = (Zi Orrin) Jai zr iol) ee 0:79) Fin. (79) 
Proof. Because of 2, — a = Op(m-"/?) by A2 the Taylor series yields 

UE Ti 9; ie Bo a 

= Big — Oerio(Eir — E10) + Op(m-¥?). (80) 

Due to the boundedness of the second derivatives, 

O%iy — Opin = Op(-¥?) , (81) 
is true by Theorem 3.5.6. Thus 


Yi — Ni = Fin — O7in(Sir — Si0) — Or(-¥?) (Sir — €i0) + Op(m-?) 
(82) 


By Theorem 3.5.6 the third item is Op(J-1). Then it follows from (3.3.68) and 
Op(max (1-1, m-"/?)) = Op(m-¥?) because of V8 that 


OprinE*(Bin — Oerio(Sir — Fi0)) + 2*(8i0 — Perio(Sia — Ein) 
+ OprinE (ae, — Fix) + Z(t, — £1) = WOp(m*”). (83) 


380 Chapter 3. Models.with errors-in-variables 


Now 
xv, — Sis =X; — Ein — (§ — Ei) = Sing — A§;, (84) 


aud we get 
(AirigX*Oerig + L%0erig + Orink” + LZ) A,§; + 1- Op(m-¥?) 
== Oripd Eig + VEig + Orin dS ig + L°S in (85) 
or, equivalently, 
(I } Orig) SLL | Ogrig) 418i + 1 - Op(m-?) = (It Ggrig) 2-Sio- (86) 
Because of X-U-! = O(1) and dri9 = O(1) we obtain the assertion. 
Theorem 3.5.8 With &9 = (—Osrio | Ip-q) Sin we have Ed ei) = 0. 
Proof. The assertion immediately results from 
(I) ér9) [Arn i I]=0. Of 
Theorem 3.5.9 
Sy = Zi + 1-Op(t-?). (87) 
Proof. 
Sa = (—Ora D) L[eral] 
= (—2:ri9 + @:(rio — Tir)! I) 2[—Orio + O79 — Tun) I) 
= Sin + (Arion — Pia) 1 0) S[—Ario tT) 
+ (—€sri } Z) Z[8(rio — Tin) | 0) 
+ (8(ris — rio) (0) 2[G(ti — rio) 10]. (88) 


In Theorem 3.5.7, 01%, — Orig = Op(l-/2) was already used and because of 
Det == O( 1), 


Sele 2 ol Osliet 2). (89) 
For regular matrices A, B, 
ASE BENS Are Ase 8] Ba) (90) 


Because of (2',1)-1 = Op(1) which follows from Amin(A'BA) = Anin(B) if 
A’A =I4+ D, D =O, we have the assertion. 


Theorem 3.5.10 


Zyl = Ligh? + Op(l-¥?) (91) 


3.5. Special asymptotics 381 
Se a ie ae fe ON 


and 
Za/m = Lyo/m + Op(I-¥?2). (92) 


Proof: For the single rows 0,r\, k = 1,...,q of 0,7; we obtain 

Ont? = Oar) + (Sir — Eio)’ Oenr$ + (at, — 209)! Onan) 

+ Op(||%™%, — 19, Si — Ei0ll*). (93) 

According to Theorem 3.5.6 it follows as in (77) that 

Onfin = On? ip + Op(max {I-1?, m-V2, F-1}) = ,rig + Op(I-¥2). (94) 
With the definition of 2,,, and 2,4 in (3.3.78) we obtain the assertion. 
Theorem 3.5.11 We have 

Ei, = Eig + A, iy(% — ™%) + Op(m-"?). (95) 
Proof. We use ; 

&i1 — Td; = & + Tio — Ta — O11 dj. (96) 
With the Taylor series expansion 


Pig = Pig + 0,8 i (%o — MH) + Oia (Fin — Sir) + Or(lot9 — 1, F109 — Sill?) 


and & — §i = Six — dio, one gets (97) 


Ein = Fin + 2,0 in(% — M1) — OrinSig + Op(l-}). (98) 


With the Taylor series expansion of @;:rj, at Orig and from V8 we get the 
assertion. Hi 


Remark 3.5.1 Tf V5 and V8 are slightly weakened with 1,1 = o(m-™), then 
the Taylor series expansion with inclusion of the second derivatives provide 
the correction terms that are needed to prove the asymptotic normality. The 
boundedness of the third derivatives then has additionally to be demanded. 
Fuller and Wolter (1982) showed this for g = 1. However, then O(max sacs 
m-12)) — O(m-1?) does not hold. 


Theorem 3.5.12 
n ~ 
Ny — My = Lag dy (On% indi Ein) + 0p(m-1?) (99) 
= 


is true, where 


Ms — Tg = Op(m-"2), 


382 Chapter 3. Models with errors-in-variables 


Proof. Multiplying (3.3.77) by 1 = (In)~1 (In) yields 


2, — 4, = (Exim)? = Sot gE)? En (100) 


NM i=1 


where these matrix terms are stochastically bounded. According to Theorem 
3.5.11 this provides 


1 ys " > 1 
a So XY Aix (l2in)* €io + (% — 1) + mie: 0p(m-12) 
“iq (101) 


The terms calculated for 2,, &;; can be replaced by those calculated over 1, &io 
according to Theorems 3.5.9 and 3.5.10. For dzriy = (@n7 Peay... it holds for 
he) seg that 


anv!) = dnr) + (Ei, — io)! Acar + Op(max (I-1, m-¥?)), (102) 
and by Theorem 3.5.7, 

= dar) + §'Osar) + Op(max (1-1, m2) (103) 
Thus we get 


n 
ede —] , v—lz 
Me — My = 279 DY) OnTin~in Ein 
a 


+ (Zz /m)-1 = DS (An—7$5;)*= ar s! (Says Eig -+ op(m-1!2) 
N ¢=1 
+ Op(E-¥?) ¥ Op(Eio)/n + (0p(max (-4,m-12))) Op(E io) | 
v—1 i=1 
+ Y Op(-¥?) Op(Eio)/n- (104) 
i=1 


Now &j9 = O(1) Sig = Op(I-1/2) = op(m-"*) and with that the remainders are 
op(m-/2); for instance, the last but one summand is 


op(max (J-4m-4, m-!4)) = op(m-1?). 


Except for stochastically bounded terms there only occur linear combinations 
of products of the components of 6; an déj) in the second item. These themselves 
are again bounded functions of $i9, which have expectation zero by Theorem 
3.5.8. Then this term is of the order 


Op(||Ciol|?) = Op(I-4) = op(m12), 
For the summand we obtain Op(m-¥?). 
Theorem 3.5.13 


£m (2, — m)} > N(O, 271). (105) 


3.5. Special asymptotics 383 
Proof. Because of the boundedness of the derivatives of 7 and JJ — O(1) it 
follows for 

L-U2g* :— Oar ig(S yl)-2 & iol? (106) 
according to V8 that 

B\l-e8 2** < Op(1) H\P%igl2** < KOp(1) < K. (107) 
Further, 

D(-V68) = Laight (108) 
hence by V6, 


m—>co i=1 m 


lim D ( S ret | = lim Z,l-! = lim n,,Z, =limn-O(1)= 00. (109) 
m 


Thereby the limit theorem of Ljapunow (cf. Loéve, 1977, p. 289; and Remark 
3.5.2) yield 


24 {253 Sel: KOS )e (110) 
ii Hy 
By V6 and from the moment convergence theorem (cf. Loéve, 1977; p. 186): 
m 
L {im SS. et} +> N(0, 55%). ' (111) 
i=1 


The left-hand side is \m (%_ — mo) + 0,(1) according to Theorem 3.5.9, hence 


(111) also yields the limit distribution of Vm (%. —7%). 


Remark 3.5.2 The relation (107) only provides a condition for the limit theo- 
rem, which in practice may be checked in applications. This convergence al- 
ready holds for (univariate) random variables #, = [-s¥, i = 1, 2,... if for 


nam 
some # > 0, € > 0, and D (> ) =,? o the relation 
i=1 


Blxe|?** =e Die|, t= 1,2,... (112) 


is true (Loéve, 1977, p. 289). We shortly will demonstrate that the condition 
(107) and V7 yield these general conditions of the Ljapunow theorem, because 
they yield for t S x/2 


Blae,|2** = (Ela;|?)¥? (Bla,|0+92)¥2. (113) 
We choose a constant c > 0 so that Hlajc|? > 1, then it follows from (113) and 


= (E\cae,|2)/2 — (EB \car,|2+2" uate") = (E\car,|2+2* 1/2 (114) 
that 
Cr rn |a,|2* 5 = c2 HB |x; |” (pd Je hs i (115) 


384 Chapter 3. Models with errors-in-variables 


With this, 
r= k= 22, CLK, (116) 


for (112), where we still take into consideration Hx; = 0. The same limit 
theorem holds for multivariate x;, too, where in (112) the norm ||z;|| appears 
instead of the absclute value |z;|. The relation (112) is fulfilled for random 
variables x; with uniformly bounded support (Loéve, 1977, p. 289), that is in 
almost all practical cases. 


The initial iteration x 


It can be shown that under further suppositions the OLSE satisfies condition 
V2. The OLSE z, minimizes 


/ 


~ 


bie, 2) = — ¥ we — len ale (117) 


nN j=1 


For details see Fuller and Wolter (1982). 


Unknown and different covariances 


With different covariances over the single design points the proof remains the 
same in essence. Instead of the Ljapunow limit theorem a central limit theo- 
rem for linear forms has to be applied. Details are to be found in Héschel (1979) 
for the case of equal numbers of replications over the single design points. 

With unknown covariances we will also obtain the asymptotic normality 
for the estimators given in (3.3.85) under certain additional assumptions on the 
number of replications compared with the size of the experimental design. For 
this purpose the asymptotic investigations can be carried out for random weigh- 
ting matrices, as it was done in Chapter 1 for regression models. A detailed 
derivation is omitted here. 


3.6 Testing hypotheses in linear functional relations 


Modelling with LIFU faces the following problems: 


— How great is the dimension of the subspace £ in which the design points ju; 
are contained? 
— May certain subspaces from the model left out? 


Statistics for testing corresponding hypotheses turn out to be closely related 
to linear hypotheses in the multivariate linear regression model. 

The present section is based on Anderson (1951a, sections 3 and 4). We start 
from the LIFU* with linear regression part according to (3.1.28), where a 


' 3.6. Testing hypotheses in linear functional relations | 385 


normalization as in (3.2.64) is assumed for the matrix of the linear restrictions L: 
Z—=MU+M,V-+6, DO bg Os 
Pe Nee Ue OOK oe 8 ea.0y lL Ls ti Te 
Mie Monn, W:=(TiVIEMixm, M := (M,} M,). 


Here the observations Z and the matrix W (of instrumental variables) are 
known. 

Unlike the linear regression model, where W would correspond to the re- 
gressor matrix, it will not be tested in this model whether HZ lies in a subspace 
that is independent of the regressors, but which consequences result from the 
special choice of LZ. 

First we want to investigate the problem as to whether the number of re- 
strictions may perhaps be reduced. Thus we consider the test problems: 


(TT) Hell j=n; Kieth )=a, 2<n 

(T2) Hor L*| = 453 K, : 7{Z*] <q. 

We investigate whether there arise certain given restrictions in the model, i.e. 
for fixed Lo € Wh, we test 

(T3) eee de Ign a= De. 

Finally, the problem 

(T4) Hea La = 0; Kes 0 


is considered, with which we test whether there exist linear restrictions in the 
model at all. Although this question should strictly speaking be answered at 
the beginning of each further investigation of the LIFU*, this problem is 
treated after (T1)—(T3), because the respective test statistics can be derived 
from the results belonging to (T1)—(T3). 


3.6.1 Tests on the dimension of the subspace 


We test the hypothesis that the rank of L+ is exactly q, against the alternative 
that it is g2, where q, is a fixed number not greater than q, and g, < p. Accor- 
ding to Theorem 3.2.9 the likelihood ratio criterion is, for normally distributed 
observations, 


i a ve 
a il (1+ a)" TI (1 + Ayes — I] (4+ A) 8 (1) 
ee eo l=q2t1 


Ay = A(Sz-wQz.v-v)> Ay Ss SA, 


95 Nonlinear Regression 


386 Chapter 3. Models with errors-in-variables 


Froin this we get as the likelihood ratio test for g, against all alternatives q, < q,: 
1 : 
—2 loga = >} log (1 + 4). (2) 
I=1 
For large n this yields approximately 
q 
n > A 
1=1 
a criterion which was suggested by Fisher (1938) and Hsu (1941a, b). Anderson 
(1951a) showed the convergence 
L{—2 log Aim} > Xi; a=qnlrn —pt+), (3) 


in case Sy.y/m possesses a regular limit and L+ is of rank q,. For (T2) we 
obtain as test criterion 


ve 


A=] +a). ag 


l=1 


A further test based on newer methods was suggested by FPujikoshi and Veitch 
(1979, p. 351). 


3.6.2 Tests under a given subspace 


For fixed Z+ € N%,., according to (3.2.107) we have 


PXq 
Q(L*) = L'Oz.u.~l* (5) 
= L''(O2.w —Q2.7) f+ = LYZ(Py — Py) ZL. (6) 


As is obvious from the representation of HZ, (cf. Bunke and Bunke (1985), 
[A 2.13]), 


Q(L+) ~ Wm, LY’ ZL+, L'M,U(Py — Py). (7) 
Under the hypothesis in (T3), 


QO = OL) ~ Wa(m, Ly'XLz). (8) 
Moreover, 
S = Ly’Sz.wLly ~ Wom — n, Ly’ ZL) (9) 


is distributed independently of @ because of (Py, — Py-) Pyi = 0. Thus, 
the criteria developed to test the linear hypotheses (cf. Bunke and Bunke, 
1986, 5.1.3, 5.1.5) can also be applied to this problem. Here we only mention 
the likelihood ratio test. By (3.2.107), 


Oz.u0.v + Szw = Szv- (10) 


a6 Testing hypotheses in linear functional relations 387 
Under the hypothesis we then have 
A = det [S] det [S + Q]-1 = det [L2'Sz.wL2] det [Le'Spy Lo}? 
~ U(q,m — n, 2) (11) 
(for the definition of U(.,.,.), ef. Bunke and Bunke, 1986, [A 2.24]). As likeli- 


hood ratio test there results 


1 f A U_« ) ELLY) 
(2) -| or < (q,m — n, n,) a5) 


0 otherwise 
For q = 1 and q¢ = 2 we obtain an F-statistics for this (cf. Bunke and Bunke, 


1986, 5.1.5. B, table 5.1, where also further epbrozania vious for the test statistics 
(11)—(15) are given). 


test statistics distribution 

si (eae 

a ey (m — n) (1 — A) i hile (13) 

mA 
(m—n—1)\(1—YVA 

q= 2: T, — (man—-a(t A) Pon,.2(m-n-1) 5 (14) 
nm VA 

m>n: —(m — 7) log A PY ny (15) 


The latter approximation is true if Sy.y/m has a regular limit: The test is con- 
sistent and unbiased as in linear regression models. 


Known error-covariance. In this case 
O = OLR) = LYE-Oz.y.yS-WLd ~ Wi (m1, Lp'Ly) 


under the hypothesis Z+ = L;. 

This makes it possible to obtain 7?-statistics from Q by producing quadratic 
forms of the kind a’Qa. The tests result by determination of the critical re- 
gions over a-quantiles of the y?-distribution. 


Ose Tests on the existence of a linear functional relation 


We consider the problem (T4). In this case, under the hypothesis there does 
not exist any vector J+ € IR? for which 


“’M, =0. (16) 


388 Chapter 3. Models with errors-in-variables 


This suggests the following test: the hypothesis is rejected if the corresponding 
test rejects (13) for all /+ € R®, which happens if the minimum of 7(/*) over 
+ is greater than Fy.n,m—n- But the minimum is the smallest eigenvalue A, 


oh S7iwQz.v.v- Thus we get as an «-test: 
Ny 
1 UA Emin 
p(z) = (17) 
0 otherwise 


This technique can also be applied to test 7[Z+] = ¢ against 7[L*] <q. As 
the critical region one obtains (Anderson, 1951a, 4.13) 


q 
Il (i Ae 4,)} Ss WAGE m— i, M); (18) 
l=1 
where 
A(SzwOz.u.-v) S ++» S Aq SZwz.0.r)- 
3.7 Confidence regions in linear functional relations 


As in Section 3.6 we consider LIFU* with a linear regression part. Till now 
the available results have not been as comprehensive as the corresponding 
results for linear regression models (cf. Bunke and Bunke, 1986, ch. 6). The 
confidence region is constructed based on the test for unknown error covariance 
by Anderson (1951a) given in Section 3.6. The respective results for known 
error covariance result according to Section 3.6.2, where details for bivariate 
LIFU* are also to be found in Kendall and Stuart (1961, 29.22). However, 
only the case g = 1 is taken into consideration. For g > 1 there arise identi- 
fiability problems in the determination of Z+; for further explanations see 
Anderson (1951a, 5.2): . 


3.7.1 The case of subspaces of codimension one 


In this case we have L+ =], € R? with a normalization [)Alp = 1 for an 
arbitrary A€ Mt, with 7[(M, | A)] > r[M,]. Consequently, a confidence 
region for L+ consists of those vectors! € IR? for which the test given in (3.6.13) 
does not reject the hypotheses. 

For known covariance 2’ we can achieve a simplification by using A = JY; 
Namely, l’Qz.y.yl and lU’Sz.wl are independently y?-distributed. In the first 
case 

aoe a {l | Al = 1, T (1) = Day reyes) (1) 


e 


results as the (1 — «) confidence region. 


3.7. Confidence regions in linear functional relations , 389 


For A= 2 there results an «-confidence region at the level « = (1 —.«,)(1— 2) 
as the set of those J € IR? for which 


VOzuvl S Nats (2) 
Neson = USz.wl = Vin (3) 


where these «, and a, are chosen in such a way that for / = J, the significance 
level is attained. 


3.7.2 Consistency of confidence regions 


Among the possible confidence regions one wants to chose consistent ones. 
That is, for any fixed ly with m — oo the confidence region becomes arbitrarily 
small for an arbitrary high confidence level 1 — «. 

We will demonstrate this for a special case. From inequality (2) we get 


V(Q/m) 1S Ze 5n,/™- (4) 
The right-hand side becomes arbitrarily small for large m: 


~ 


1 pa a 
Q/m = — ZU'(UU’) 1 UU'(UU')-1 UZ’ 
m 
with 
x 1 1 
U = U(I — Py’), lim — UU’ (- lim — Sor] E Me . (5) 
m 


Moreover, ZU’(0U’)-? is arbitrarily close to M,, with a probability converging 
to one. Provided that 1;M, + 0, then m can be chosen so large that 1,M, 
satisfies (4) with arbitrarily small probability. 


3.7.3 Bivariate linear models 


From the results of Section 3.5.1, based on the asymptotic distributions, it is 
easy to construct the related asymptotic confidence intervals, which will be 
shown for the case of nonreplicated observations. For this purpose we denote 
the estimator of the asymptotic covariance derived in (3.5.40) by Dé a, B). 
Then it approximately holds that 


[4, B] ~ (La, ], °D); 
hence a y-confidence region for [«, B] is given by 
= ([a, BI | Illa, 6] — (4, Blllep- S x52, 


where 7?.. is the y-fractil of a central y?-distribution with two degrees of free- 
dom. Here we can give ®D~! as the i inverse of a (2 X 2)-matrix even explicitly, 
but we omit this here. 


390 Chapter 3. Models with errors-in-variables 


3.8 Numeries 


In this section we describe some algorithms to calculate WLSE, for which 
there are already available some results from practical applications or which 
can be simply realized by standard programs, especially for: 


— bivariate and multivariate LIFU; 
— polynomial relations; 

— explicit models; 

— implicit models. 


In principle each general method to minimize quadratic functionals with 
restrictions on the parameters can be taken to compute the WLSH, especially 
general methods for the solution of the nonlinear normal equation. However, 
with the corresponding minimization problems there are np variables in the 
experimental design and at least nq restrictions. Thus for sample sizes of about 
50, which are quite frequent in practical applications, most of the available 
general programs to solve the resulting normal equations or the corresponding 
minimization problems fail. 

However, practicable realizations of the known iteration methods, such as 
the methods of Gauss-Newton, Newton-Raphson, and others, can be derived 
from the special structure of the normal equations. This section aims at offering 
an insight into the problems of numerics in errors-in-variables models to the 
statistician and at showing some approaches to the solution of the peculiar 
large-dimensional problems arising in errors-in-variables models to numerical 
mathematicians. 

The realization in concrete computer programs of course depends on the 
computing facilities available. Further, no detailed convergence conditions 
for the described methods will be given, as the related special numerical investi- 
gations would beyond the scope of this book. The fundamental convergence 
statements for the corresponding algorithms are collected in standard numeri- 
cal texts; we refer especially to the monograph by Schwetlick (1979). 


3.8.1 Linear functional relations 
3.8.1.1 Bivariate linear functional relations with nonrandom unobservable variables 
The WLSE shall be calculated for the simple bivariate LIFU* 

N=O+ BE. Bout Cy, j =1,...,m; (1) 
with the covariance 


i. 0 
Doi; = Diag ( 4 (2) 
et 


3.8. Numerics ; 391 
ame eee een eats Pt Ph re PSOE 


The likelihood criterion is 
k(u) = © (CMe — &) + oy; — « — BE). (3) 


The normal equations result from the derivatives of k, (Williamson, 1968; 
see also Section 3.2.6): 


&+pe=¥ (4) 
n n * 
&= di smild si, 84 = (06; + B?o5;)- = 8,(8). 
i=1 i=1 
n 
Let @; = x; — #, then k,(u) = ¥ s;(8%; — 9;)?, and hence 
i=1 
n 
0 = d,k = ¥ (si€i(B4i — G1) + 828 os(B4; — 9;)). (5) 
4=1 
We get the following iteration method (Williamson, 1968): 
n n j 
Bass SF Sib ,X; = Ly SitiYi;, 4 = O; 17 2, osey : (6) 
i=1 i=1 
8i=Si(Bx), ti = Li(Bx) = 8i(B) (Os%i + B.o0Gi), (7 
) 
B= (P.), Gi =GilPx)- 
The estimators for é; result in 
£; = 8;(6) (oi; + Bosily; — 4)). (8) 


With replicated observations 
Zyy = Mi t+ Cy (9) 
we obtain a WLSE, which is MLE in the normal case, with the substitutions 
25> 2%, Cai > ili; @ i= 38,0. (10) 


For unknown error variances 05;, o,; the method can be applied analogously. 
Instead of the 2;, y; we take the 2; y;, again, and 05;, o,; have to be replaced by 
the estimators wyx,/(m; — 1), wy,/(m; — 1) by Theorem 3.2.9. This estimator 
even yields the MLE in the normal case. 


3.8.1.2 Bivariate linear functional relations with random unobservable variables 


The method to determine the MLE in the normal case, which was described 
in Section 3.2.1 (cf. Table 3.2.1), can be easily treated for the bivariate LIFU- 


by standard programs. 


392 Chapter 3. Models with errors-in-variables 


3.8.1.3 Multivariate linear functional relationships with nonrandom unobser- 
vable variables 


It is mainly the models with independent observations over the design points 
that are of practical interest here. The WLSE with known covariance results 
by Section 3.2.3 (Theorem 3.2.3), the MLE for unknown covariance results by 
Section 3.2.5 (Theorem 3.2.9) as WLSE with the weighting matrix Sz.y. 
Both estimators can be obtained by means of the standard program for eigen- 
value problems. 


3.8.2 Bivariate polynomial relations 
3.8.2.1 Polynomial relations 


Polynomial relations are the appropriate models for curve-fitting in many 
practical applications: 


if 


= 28 Vig 7(E A) Oe Re a et Ol, (11) 


For the observations z; = [x;, y;] of [&,7;] let the covariance matrix be 
Diag (05;, o.;). The WLSE for these models can also be obtained with the 
methods for general models. But, by exploiting the special form there result 
several simplifications. We introduce the method suggested by O’ Nell, Sinclair, 
and Smith (1969) — a modified Newton-Raphson algorithm — which can of 
course be extended to general models (cf. Section 3.8.3). Let 


n = r(é, 2) = > ap, (4), [WO nay Tt | (12) 
k=0 


be an expansion of r(€) in polynomials p, of dgree k. By a suitable choice 
of the p, we can later on achieve simplifications in the algorithm. The distance 
to be minimized is 


n 


kB) = & (oar (ei — 8)? + oa'(ys — 116i 2))). (13) 


We take 7, &(»); a8 initial approximations for the true parameter 7, €n)9. 


3.8.2.2 Newton-Raphson procedures 


We have the equations 0,k = 0, y = [m, &()] for the stationary points of k. 
For the second iteration y, = y, + Jey we obtain from the Taylor series 


ra) pln ra) Key -+- Oy, phy A.p = = OF (14) 


3.8. Numerics 393 


ee ee ee eran ed een OUR Seah eee A Eee ee eee PL PD 


the following r + 7 nonlinear equations: 


Oy yi Asp = —O,k,. (15) 
Now 

Co =p Oa(Y¥i — Ta) Onis (16) 

Ask, = —2(o5 (a; — €;) + o@(yi — ru) Git, aioe (17) 

On, nk, = —2 x Oa! (—O,7 a Ont) + (Yi — Ta) Anahi): (18) 


t=1 


It is obvious that 0,,,7; = 0 here. Then 0,,,k, results in this special model 
as the corresponding block of the information matrix (cf. (3.5.4)). In the general 
case, Oy yk, is more complicated (cf. Section 3.8.3). If we choose the polynomials 
pu; orthogonally with respect to the weights o,;', we get 0,,,k, as a diagonal 
matrix from (18). 0: has diagonal form, too. Moreover, we obtain the ele- 
ments of 07,:(,)k, according to 


An0,8k1 = —20;;' 8:((yi — rir) pilEi,)). (19) 


Based on this representation, O’ Neill et al. (1969) suggest the following approxi- 
mate solution of (15): first we take into consideration that the elements of 
0,,ek, contain less terms than that of 0,,,4, and 0; :k,, which are typically of 
smaller order. The matrix Diag ((@,,.4:)~11 (@:,24,)-1) may serve as the first 
approximation of (@,,,4,)~1 and then we get 


Tg = My — (2, ahi) Onk, (20) 


in the form 


n n 
WM = DY of yipi(En)! do 6G pi(En)- (21) 
ion isiN 
Similarly, we get 

Oa(Yi — Tin) Oerin + O57 (21 — Fir) (22) 
6. (Oerin)®? — Oa (Yi — Tir) Osea + > 


fig = CH ga 


In (22) we can obtain better values of i. if the new values of x, from (21) 
are taken to calculate the new values of 7;, 0:7j, Ozer;. 


3.8.2.3 The algorithm 


As the initial approximation we choose &,); = %n); the orthogonal poly- 
nomials p, we get from the relations (Forsythe, 1957): 


po(é) = 1, Dis) = ee 1 
pé) = (€ — 81) ial) — trpi-2(€), 


(23) 


394 Chapter 3. Models with errors-in-variables 


where 
n n 
8, =.) 64 Pi-alFa)? $51 Ooi Prs(i,)* (24) 
i=1 i=1 
n n 
ty =) >) Og Pi Pi-2afu) 25 On Dio (25) 
i=1 i=1 


According to (20) this yields a polynomial for which &,,); = %,) provides a 
curve-fitting in the regression case. As a simplification of the algorithm we 
can use the derivatives for orthogonal polynomials by Smith (1965). 

Often we need the original form (11) of the estimated polynomial, which is 
obtained from the relations 


0 lige JoS>0 Ce Wp il. 
tn = } (26) 
fonmy e107 
Uner,isn = Uni — Sinsi,s — Casi forh <1 (27) 
r 
20) = OU 541 for 0) SRS Po (28) 


jal 


As with all such algorithms, the convergence depends on the initial approxi- 
mation. Unlike other numerical problems, in which there is no natural initial 
approximation, we have such a natural initial approximation for &,,) here, 
namely x,). Surely a») lies with great probability sufficiently close to of 
&(n)9 In order to guarantee the ocnvergence of the algorithm against the WLSE 
£. Indeed, the sequence of iterations may converge against any stationary 
point of &,(z, &,)) although this probability will often be small because of the 
accuracy of the initial approximation 2,,). In applications where the WLSE as 
global minimum has to be computed, we will consequently apply special 
methods to calculate all stationary points of k, and pick out the WLSE from 
those (ci. Egerton and Laycock, 1979). 


3.8.3 General models with errors-in-variables 

3.8.3.1 Conditions for an application of the procedures 

In this section we describe the known approaches to the iterative solution of 
normal equations, where we have to omit details. We use the compact way of 


writing (3.1.26): 


Zn) = Mino + O(n)> nyo € Re, (29) 


0= 8(M(nyo> Ilo) 5 Teg € RR‘, S = 8(n) ; IR™4u+4x > Rom (30) 


3.8. Numerics 395 
we EEA a a a ae a eZ 


with c < d,, and for explicit models we set 
Rie ([xi, CAV ae hee & = Xn), etc., (31) 
Min) = Mn) (E(ny, %) = ([é:, ri(Si, m)\)ims ‘as ne (32) 


Let the weighting matrix Q-! for the calculation of the WLSE be known and 
moreover let the usual regularity assumptions for s be fulfilled, such as the 
regularity of the functional matrix. In case the covariance of z is not known, 
it is possible with replicated observations to obtain the weighting matrix Q-} 
by means of an estimator for Dé (cf. Section 3.2.7). 

Starting from the likelihood function in the normal case, 


1 
k, = k,(, mw, A) = olen Ulloa + A’'s(u, 2) (33) 


results as the Lagrange function for the implicit model, and for the explicit 


model 


by = kalo, 8) = 5 lle — al, mb (34) 


results as the minimization criterion. 
We denote the initial approximation by 2, Mn), and the WLSE by 4, A. 


3.8.3.2 Gauss-Newton procedures 


This method is based on the linearization of s in each step of the iteration, 
where we abbreviate s(n), =: S, and Mn), =: Ma- 


S(Mny2s I) = Sq FY 8, + O78; (%_. — ™) + 0,81 (Me — fy) © 8(A, %). (35) 


Then we obtain the normal equation for 2, Mn)g With the approximated func- 
tion 
—Q1(z — fz) = ORV = 0, 


0,842 = 0, (36) 
Sy a OS Aon + 0,8; 4a" = 0. 


This yields corrections of the initial iteration after a simple transformation 
(cf. Britt and Luecke, 1973): 


{2 = 2 — Q G),81(8,8,;2 0,81) (51 + 0,81(%2 — %) + O,81(2 — 1) (37) 


Ta — My = —(01,8;(8,8;2 Culat 081(8,8,2 0,81) 3 (s, + 0,81 (2 ae tx))- 
(38) 
Notice that uw, does not have to lie on the manifold S,,,,. 


396 Chapter 3. Models with errors-in-variables 


For explicit models the iteration x, results from (38) in the same way, but 
6,8 and @,s have a simpler form than in the general case: 


O78) = 0,(—Ni + rilSir, m))i—1 eo thie (O,ri(€i» ca eer eae (39) 
Qu81 = Diag ((Ari(En, m1)! —Ly)),_ 3, (40) 


For g = 1, p = 2, Dolby (1976b) showed that the iteration of independent 
variables &(,) which occurs in (37), can be more simply obtained for explicit 
models by means of the information matrix: 


ne 


UO eee ul 7 
( ) = Inf? (2, ema Q(z aE: /ny1))- (41) 
E(nyo Tee (ni 


Here Inf is the information matrix at the point (2, &,),) given in (3.5.4) to 
(3.5.9) and 


Aun = ((O1 Qri(En, m4)" (42) 
Oey btinya = Diag (es arrilEnn, IN IPee vce (43) 


From the way in which (41)—(43) are written, it is immediately obvious that 
the proof by Dolby (1976b) remains valid for general explicit models with 
prs: 

Hoschel and Penev (1980) showed the global convergence of a regularized 
Gauss-Newton procedure. Schwetlick and Tiller (1985, 1989) were able to reduce 
the computational effort further and improve the numerical stability. 


3.8.3.3 Simplified Gauss-Newton procedures 


Deming (1943) suggested not to perform the linearization at (2, 4»),) in each 
step of the iteration but to linearize s at the point (7, 2;,)). We do not obtain 
the WLSE (@, %) even in case the initial approximation 7, (,), lies in so small 
a neighbourhood of #, 2 that the computed Gauss-Newton method would con- 
verge, but practical studies (O’NevIl et al., 1969) show that this simplification 
provides good estimations. These can be used as initial approximations for 
the complete Gauss-Newton method if necessary. Now the objective function 
to determine 9, {4(n)2 18 


1 
eh lle — Mnello- SF 2 (s(z, 7) + On84(% — My) + Ay81(U(ny2 — z)). (44) 


We get ; 
My — % = —(018,(0,5,Q 81,8:)-* On8,)-1 4),84(8,8;2 8),8,)-1 8(2, 4) (45) 


fe = 2 — 20,,8,(0,8,2 81,81) (8(2, m) + On84(o2 — m)) (46) 


3.8. Numerics 397 


a a 


3.8.3.4 Modified Gauss-Newton procedures 


In the Gauss-Newton method the iteration j;»). does not lie on the manifold 
Sy,x, -It is obvious to determine Hn2 2 a WLSE with given 2, hence we get 
the following iteration process: 


WLSE N 


G GN 
Pde ays CS wae 


WLSE 
TT 7 E(ny2 € Sinai m2 do" (47) 


Let x, be the initial approximation of %. The normal equations for “4, a8 
WLSE with respect to the observations z,) on the surface Sy,n, are 


Q(z = wm) + 8,8(¢ny1> M1) Ay = O 


(48) 
S(M(ny1> ™) = 0. 


The Lagrange multiplier does not occur for explicit models and the normal 
equation is simplified: 


(z ms Hn). Qt Geen) = 0. (49) 


The form of 6¢,,,(ny1 results from (43). With this we get the method for explicit 
models suggested by Fuller and Wolter (1982) (cf. Sections 3.3.3 and 3.5.4). 
mt, 18 determined by the same linearization as in the Gauss-Newton method, 
1.€. 2% results from equation (38) which contains in the case of repeated obser- 
vations equation (3.3.17), which was derived independently for explicit models, 
as a special case. However, compared with the Gauss-Newton method this 
method brings about an additional difficulty. Equations (48) and (49), respec- 
tively, also demand the solution of large-dimensional nonlinear equations. 
The solution of (49) can be obtained in the case of a block-diagonal weighting 
matrix Q-1, because equation (49) then decomposes into n subequations. There 
only the (p — q)-vector &;, has to be determined (cf. (3.3.68)). Should this 
be too expensive, the original Gauss-Newton method has to be choosen. 
However, the asymptotic properties remain valid under the assumptions of 
Section 3.5.4, because the fundamental convergence §; — &i9 = O,(m; /?) 
remains valid in spite of the linearization that is carried out then. 

A further possibility: the simple Gauss-Newton algorithm is carried out 
up to the iteration 2,, “4(»),, only in the final iteration step one applies the 
modified method and obtains an estimation s4(»),4, of the experimental design 
which — in contrast to the simple Gauss-Newton method — lies on the mani- 
fold 8, 


yi" 


3.8.3.5 Newton-Raphson procedures 


Unlike the Gauss-Newton method, the restriction s is not linearized here but 
a Taylor series expansion to second order is used for the objective function. 
Thus the method would also be applicable for the calculation of alternatives 


398 Chapter 3. Models with errors-in-variables 


to the WLSE, e.g. for MLE with nonnormally distributed errors. For poly- 
nomial relations this method was explained in Section 3.8.2; for explicit models 
MacDonald and Powell (1972) gave the respective algorithm, which is based 
on a special method to compute higher-order derivatives. 

We put y = [z, uw, A] for implicit models with the objective function (33), 
and y = [z, é] for explicit models by (34). As the approximated objective 
function we obtain 


: ; 
ke = ky + Oyky Aap -- 9 Aap Oy, yh Aoy (50) 


in the second iteration step. The stationary points of k, result from 


0 = ky = Ok, + pki Ay, (51) 
hence we have 

Yo = V1 — (G,yhi)* Oyky. (52) 
For explicit errors-in-variables models one obtains 

Ok = 2’ O20 6; C=2— p(€,2),; (53) 

Ayayle = (Oyb)’ Q-3 AC + ((Ap(Gpayo))’ Q-3 (54) 


It is obvious that the first term in (54) is just the information matrix Inf at 
the point y, (cf. (3.5.4) —(3.5.6)), and 0,¢ = —@,u(é, 2). 

This explains that the Newton-Raphson method includes an additional correc- 
tion term to the Gauss-Newton method given in (41). But, with the currently 
available means the inversion of @,,,k, in the case of medium sample sizes seems 
only to be possible for the case of a block-diagonal weighting matrix. 

Provided that u(y) is linear in x and &, the correction term vanishes in (54) 
and then the Gauss-Newton and the Newton-Raphson method ase equivalent 
for LIFU. A comprehensive discussion of modern procedures is given by 
Schwetlick and Tiller (1985, 1989). 


3.9 References 


The following reference list consists of two parts. The first is a detailed biblio- 
graphy on the problem of errors-in-variables models (including linear simul- 
taneous equations). The second contains all publications quoted in Chapter 3, 
which deal with other problems. 


3.9.1 References for errors-in-variables models 


Acton, F. S. (1959). Analysis of Straight-line Data. John Wiley, New York. 
Adcock, R. J. (1877). “Note on the method of least squares.’ The Analyst, 4, 183—184. 
Adcock, R. J. (1878). ‘A problem in least squares.’ The Analyst, 5, 53—54. 


3.9. References | 399 
aS i sR Ae a a ar Saal A ah 


Aigner, D. J. (1966). ‘Errors of measurement and least squares estimation in simple 
recursive model of dynamic equilibrium.’ Econometrica, 34, 424—432. 

Aitken, A. C. (1933). ‘On fitting polynomials to data with weighted and correlated errors.’ 
Proc. Roy. Soc. Edinburgh, 54, 12—16. 

Allen, R. G. D. (1939). ‘The assumptions of linear regression.’ Hconomica, N.S., 6, 
199—204. 

Anderson, T. W. (1951a). ‘Estimating linear restrictions of regression coefficients for 
multivariate normal distributions.’ Ann. Math. Statist., 22, 327—351. 

Anderson, T. W. (1951b). ‘The asymptotic distribution of certain characteristic vectors.’ 
Proc. Second Berkeley Symp. Math. Statist. Prob., University of California Press, 
Berkeley. 

Anderson, T. W. (1974). ‘An asymptotic expansion of the distribution of the limited 
information maximum likelihood estimate of a coefficient in a simultaneous equation 
system.’ J. Amer. Statist. Assoc., 69, 565—573. 

Anderson, T. W. (1976). ‘Estimation of linear functional relationships: approximate 
distributions and connections with simultaneous equations in econometrics (with 
discussion).’ J. Roy. Statist. Soc., Ser. B, 38, 1—36. 

Anderson, T. W. (1977). ‘Asymptotic expansions of the distributions of estimates in 
simultaneous equations for alternative parameter sequences.’ Hconometrica, 45, 
509—518. 

Anderson, T. W. (1984). ‘Estimating linear statistical relationships.’ Ann. Statist. 12, 
1—45. 

Anderson, T. W., and Rubin, H. (1949). ‘Estimation of the parameters of a single equation 
in a complete system of stochastic equations.’ Ann. Math. Statist., 20, 46—63. 

Anderson, T. W., and Rubin, H. (1950). ‘The asymptotic properties of estimates of the 
parameters of a single equation in a complete system of stochastic equations.’ Ann. 
Math. Statist., 21, 570—582. 

Anderson, T. W., and Rubin, H. (1956). ‘Statistical inference in factor analysis.’ Proc. 
3rd Berkeley Symp. Math. Statist. Prob., 5, 111—150. University of California Press, 
Berkeley. 

Anderson, T. W., and Sawa, T. (1973). ‘Distributions of estimates of coefficients of a 
single equation in a simultaneous system and their asymptotic expansions.’ Hcono- 
metrica, 41, 683—714. 

Attfield, C. L. F. (1980). ‘Testing the assumptions of the permanent-income model.’ 
J. Amer. Statist. Assoc., 75, no. 369, 32—38. 

Austen, A. E. W., and Pelzer, H. (1946). ‘Linear curves of best fit.’ Nature, 157, 693—694. 

Bamberg, G., and Emrich, O. (1970). ‘Lineare Regression mit kumulativem Fehler in der 
unabhangigen Variablen.’ Operations Research Verfahren, 8, 1—14. 

Banerjee, K. S., and Neir, K. R. (1948). ‘A note on fitting of straight lines if both vari- 
ables are subject to error.’ Sankhya, 6, 331. 

Bard, Y. (1974). Nonlinear Parameter Estimation. Academic Press, New York. 

Barnett, V. D. (1967). ‘A note on linear structural relationships when both residual 
variances are known.’ Biometrika, 54, 670—672. 

Barnett, V. D. (1969). ‘Simultaneous pairwise linear structural relationships.’ Biometrics, 
25, 129—142. 

Barnett, V. D. (1970). ‘Fitting straight lines — the linear functional relationships with 
replicated observations.’ Appl. Statistics, 19, 135—144. , 

Bartlett, M. S. (1949). ‘Fitting a straight line when both variables are subject to error.’ 
Biometrics, 5, 207 —212. 

Bartlett, M. S. (1957). ‘A note on tests of significance for linear functional relationships. 
Biometrika, 44, 268 — 269. 

Barton, D. E., and David, F. N. (1960). ‘Models of functional relationships illustrated 
on astronomical data.’ Bull. Intern. Statist. Inst., 87, 9—33. 


400 Chapter 3. Models with errors-in-variables 


Basman, R. L. (1961). ‘A note on the exact finite sample frequency functions of genera- 
lized classical linear estimators in two leading overidentified cases.’ J. Amer. Statist. 
Assoc., 56, 619 —636. 

Basman, R. L. (1963). ‘A note ou the exact finite sample frequency functions of genera- 
lized classical linear estimators in a leading three equation case.’ J. Amer. Statist. Assoc., 
58, 161—171. 

Basu, A. P. (1969). ‘On some tests for several linear relations.’ J. Roy. Statist. Soc., 31, 
65—71. 

Berkson, J. (1950). ‘Are there two regressions?’ J. Amer. Statist. Assoc., 45, 164—180. 

Bhargava, A. K. (1977). ‘Maximum likelihood estimation in a multivariate errors in 
variables regression model with unknown error covariance matrix.’ Commun. Statist.- 
Theor. Meth., A 6, 587—601. 

Birch, M. W. (1964). ‘A note on the maximum likelihood estimation of a linear struc- 
tural relationship.’ J. Amer. Statist. Assoc., 69, 1175—1178. 

Blalock, H. M., Jr. (1969). ‘Estimating measurement error using multiple indicators 
and several points in time.’ Amer. Sociological Review, 35, 101—111. 

Blalock, H. M., Jr., Carter, L. F., and Wells, C. S. (1970). ‘Statistical estimation with 
random measurement errors.’ In: Sociological Methodology (Eds. L. F. Borgatta and 
G. W. Bohrnstedt). Jossey-Bass. Inc., San Francisco, 75—103. 

Booth, G. D. (1973). The errors-in-variables model when the covariance matrix is not 
constant. Ph. D. Thesis, Iowa State Univ. 

Bower, D. R., and Swindel, B. F. (1972). ‘Rounding errors in the independent variables 
in a general linear models.’ Technomeirics, 14, 215—218. 

Brennan, J. F., and Housner, G. W. (1948). ‘Estimation of linear trends. Ann. Math. 
Statist.’, 19, 3830—388. 

Briggs, F. H. A. (1962). ‘The influence of errors on the correlation of ratios.’ Hconometrics, 
30, 162—177. 

Britt, H.I., and Luecke, R. H. (1973). ‘The estimation of parameters in nonlinear, implicit 
models.’ T'echnometrics, 15, 233—2147. 

Brooks, C., Compston, W., McIntyre, G. A., and Turek, A. (1966). ‘The statistical assess- 
ment of Rb-Sr Isochrons.’ J. Geophysical Research, 71, 5459—5468. 

Brown, G. H. (1978a). “Generalized least squares applied to the linear ultrastructural 
model.’ Biometrika, 65, 441—444. 

Brown, G. H. (1978b). ‘Calibration with an ultra-structural model.’ Appl. Statistics, 27 (1), 
47—51. 

Brown, G. H., Kadane, J. B., and Ramage, J. G. (1975). ‘The asymptotic bias and the 
meai-squared error of double k-class estimators when the disturbances are small.’ 
Intern. Econ. Rev., 15, 667 —679. 

Brown, R. L. (1957). ‘Bivariate structural relation.’ Biometrika, 44, 84—96. 

Brown, R. L., and Fereday, F. (1958). ‘Multivariate linear structural relations.’ Bio- 
metrika, 45, 146—153. 

Brundy, I. M., and Jorgenson, D. W. (1971). ‘Efficient estimation of simultaneous equa- 
tions by instrumental variables.’ Review of Economics and Statistics, 58, 207 —226. 
Carlson, F. D., Sobel, H., and Watson, G. S. (1966). ‘Linear relationships between vari- 

ables affected by error.’ Biometrics, 22, 252—267. 

Carter, L. F., see Blalock, H. M. 

Carter, R. L., and Fuller, W. A. (1980). ‘Instrumental variable estimation of the simple 
errors-in-variables model.’ J. Amer. Statist. Assoc., 75, (371) 687—692. 

Casson, M. C. (1973). ‘Linear regression with error in the deflating variable.’ Econome- 
trica, 41, 751—759. 

Casson, M. C. (1974). ‘Generalized errors in variables regression.’ Review of Economic 
Studies, 41, 347—352. 


. 3.9. References 401 
i a ee Ro ee ea SB 


Celmins, A. (1973). ‘Least squares adjustment with finite residuals for nonlinear con- 
straints and partially correlated data.’ Trans. 19th Conf. Army Mathematicians, 
Orlando, Florida. Part 2, 809—858 (US Army Research Office, Report No. 73-3). 

Chan, L. K.,and Mak, R. K. (1979a). ‘On the maximum likelihood estimation of a linear 
structural relationship when the intercept is known.’ J. Multivariate Analysis, 9 
304—313. 

Chan, L. K., and Mak, 7. K. (1979b). ‘Maximum likelihood estimation of a linear struc- 
tural relationship with replication.’ J. Roy. Statist. Soc., Ser. B, 41, 263—268. — 

Chan, L. K., and Mak, T. K. (1984). ‘Maximum likelihood estimation in multivariate 
structural relationships.’ Scand. J. Statist., 11, 45—50. 

eo N. N. (1965). ‘On circular functional relationship.’ J. Roy. Statist. Soc., Ser. B, 

7, 45—56. 

Chan, N. N. (1977). ‘On an unbiased predictor in factor analysis.’ Biometrika, 64, 
642 —644. 

Chan, N. N., and Mak, T. K. (1984). ‘Heteroscedastic errors in a linear functional rela- 
tionship.’ Biometrika, 71, 212—215. 

Clutton-Brock, M. (1967). ‘Likelihood distributions for estimating functions when both 
variables are subject to error.’ T'echnometrics, 9, 261—269. 

Cochran, W. G. (1937). ‘Problems arising in the analysis of a series of similar experi- 
ments.’ J. Roy. Statist. Soc., Suppl. 4, 102—118. 

Cohen, J. L., and D’ Lustachio, P. (1978). ‘An affine linear model for the relation between 
two sets of frequency counts. With response by W. A. Fuller.’ Biometrics, 34, 514—521. 

Compston, W., see Brooks, C. 

Cook, W. R. (1931). ‘On curve fitting by means of least squares.’ Philosophical Magazine, 
Ser. 7, 12, 1025—1059. 

Copas, J. B. (1972). ‘The likelihood surface in the linear functional relationship problem.’ 
J. Roy. Statist. Soc., Ser. B, 34, 274—278. 

Cox, N. R. (1976). ‘The linear structural relation for several groups of data.’ Biometrika, 
63, 231—237. 

Cox, N. R., and Dolby, G. R. (1977). ‘Corrections and Amendments (to Cow (1976) and 
Dolby (1967a)).’ Biometrika, 64, 427. 

Creasy, M. A. (1956). ‘Confidence limits for the gradient in the linear functional rela- 
tionship.’ J. Roy. Statist. Soc., Ser. B, 18, 65—69. 

David, F. N., see Barton, D. H. 

Davies, R. B., and Hutton, B. (1975). ‘The effect of errors in the independent variables 
in linear regression.’ Biometrika, 62, 383—391. 

DeGracie, J. S. (1960). ‘Anaiysis of covariance when the concominant variable is mea- 
sured with error.’ Unpublished Ph. D. Thesis, Iowa State University, Ames, Iowa. 

DeGracie, J. S., and Fuller, W. A. (1972). ‘Estimation of the slope and analysis of co- 
variance when the concominant variable is measured with error.’ J. Amer. Statist. 
Ass., 67, 930—937. 

D’Eustachiol, P., see Cohen, J. E. 

Deming, W. E. (1931). ‘The application of least squares.’ Philosophical Magazine, Ser. 7, 
11, 146—158. 

Deming, W. E. (1943). Statistical Adjustment of Data. J. Wiley, New York. 

Dent, B. M. (1935). ‘On observation of points connected by a linear relation.’ Proc. 
Physical Soc. London, 47, 92—108. 

Dolby, G. R. (1972). ‘Generalized least squares and maximum likelihood estimation 
of nonlinear functional relationships.’ J. Roy. Statist. Soc., Ser. B, 34, 393—400. 

Dolby, G. R. (1976a). The ultrastructural relation a synthesis of the functional and 
structural relations. Biometrika, 68, 39—50. 

Dolby, G. R. (1976b). The connection between methods of estimation in implicit and 

explicit nonlinear models. Appl. Statistics, 25, 157—162. 


26 Nonlinear Regression 


= 


17 
V 


402 Chapter 3. Models with errors-in-variables 


Dolby, G. R. (1976 c). Estimation of ultrastructural relations: a synthesis of the functional 
and structural models and factor analysis. Bull Australian Math. Soc., 14, 473—476. 

Dolby, G. R. (1976d). ‘A note on the linear structural relation when both residual vari- 
ances are known.’ J. Amer. Statist. Assoc., 17, 352, 353. 

Dolby, G. B., see also Cox, N. R. 

Dolby, G. R., and Freeman, T. G. (1975). ‘Functional relationships having many inde- 
pendent variables and errors with. multivariate normal distribution.’ J. Multivariate 
Analysis, 5, 466—479. ; 

Dolby, G. R., and Lipton, S. (1972). ‘Maximum likelihood estimation of the general non- 
linear functional relationship with replicated observations and correlated errors.’ 
Biometrika, 59, 121—129. 

Dolby, G. R., and Ratkowsky, D. A. (1975). ‘Taylor series linearization and scoring for 
parameters in nonlinear regression.’ Appl. Statist., 24, 109—111. 

Dorff, M. R. (1960). ‘Large and small sample properties of estimators for a linear func- 
tional relation.’ Unpublished Ph. D. Thesis, Iowa State University, Ames, Iowa. 

Dorff, M. R., and Gurland, J. (1961a). ‘Small sample behavior of slope estimators in 
a linear functional relation.’ Biometrics, 17, 283—298. 

Dorff, M. R., and Gurland, J. (1961b). ‘Estimation of the parameters of a linear functio- 
nal relation.’ J. Roy. Statist. Soc., Ser. B, 28, 160—170. 

Drion, EH. F. (1951). ‘Estimation of the parameters of a straight line and of the variances 
of the variables if they are both subject to error.’ Indagationes Mathematicae, 18, 
256 — 260. 

Durbin, J. (1954). ‘Errors in variables.’ Revue de l’ Institute International de Statistique, 
22, 23—32. 

Efron, B. (1962). ‘The fitting of straight lines when both variables are subject to errors 
and the ranks of the means are known.’ Techn. Report, Rand Corporation, Mimeograph. 

Egerton, M. F., and Laycock, P. J. (1979). ‘Maximum likelihood estimation of multi- 
variate nonlinear functional relationships.’ Math. Operationsforsch. Statist., ser. 
statist., 10, 273—280. 

El-Sayyad, G. M., and Lindley, D. V. (1968). ‘The Bayesian estimation of a linear func- 
tional relationship.’ J. Roy. Statist. Soc., Ser. B, 30, 190—202. 

Emrich, O., see Bamberg, G. 

Farebrother, R. W. (1976). Discussion to Anderson, 7’. W. (1976). 

Fedorow, W. W. (1974). ‘Regression problems with controllable variables subject to 
error. Biometrika, 61, 49—56. 

Feldstein, M. (1973). ‘Multicollinearity and the mean square error of alternative esti- 
mators.’ Hconometrica, 41, 337—345. 


Feldstein, M. (1974). ‘Errors in variables: a consistent estimator with smaller MSE in 
~ finite samples.’ J. Amer. Statist. Assoc., 69, 990—996. 


Fereday, F., see Brown, R. L. 

Fisher, R. A. (1938). ‘The statistical utilization of multiple measurements.’ Ann. Eugen., 
8, 376—386. 

Florens, J.-P., Mouchart, M., and Richard, J. F. (1974). ‘Bayesian inference in error-in- 
variables models.’ J. Multivariate Analysis, 4, 419—452. 

Florens, J. P., Mouchart, M., and Richard, J. F. (1976). ‘Lokelihood analysis of linear 
models.’ CORE discussion paper, Centre for Operations Research and Econometrics, 
Université Catholique de Louvain, Louvain. 

Florens, J.-P., Mouchart, M., and Richard, J. F. (1979). ‘Specification and inference 
in linear models.’ CORE Meoesee paper 7943, Centre for Operations Research and 
Econometrics, Université Catholique de Louvain, Louvain. 

Forsythe, G. E. (1957). ‘Generation and use of orthogonal polynomials for data fitting 
with a digital computer.’ J. Soc. Indust. Appl. Math., 5, 74—88. 

Freeman, T'. G., see Dolby, G. R. 


3.9. References 403 


eee 


Frisch, R. (1934). Statistical Confluence Analysis by Means of Complete Regression Systems. 
University Institute of Economics, Oslo. 

PFujikoshi, Y., and Veitch, L. G. (1979). ‘Estimation of dimensionality in canonical 
correlation analysis.’ Biometrika, 66, 345—351. 


/ Fuller, W. A. (1971). ‘Properties of estimators in the errors-in-variables model.’ Contri- 


V bution to the 1971 annual meeting of Econometric Society. 
Fuller, W. A. (1977). ‘Some properties of a modification of the limited information esti- 
mator.’ Econometrica, 45, 939 —954. 
| Fuller, W. A. (1980). ‘Properties of some estimators for the errors-in-variables model.’ 
Ann. Statist., 8, 407—422. 
Fuller, W. A., and Hidiroglou, M. A. (1978). ‘Regression estimation after correction for 
attenuation.’ J. Amer. Statist. Assoc., 73, 99—104. 
Fuller W. A., Warren, R. D., and White, J. K. (1974). ‘An error-in-variables analysis 
of managerial role performance.’ J. Amer. Statist. Assoc., 69, 886—893. 
Fuller, W. A., Wolter, K. M. (1977). ‘Estimation of the quadratic errors-in-variables 
\ model.’ Manuscript, Statistical Laboratory, Iowa State University, 4—77. 
“sp faller, W. A., and Wolter, K. M. (1982). ‘Estimation of nonlinear errors-in-variables 
Va models.’ Ann. Statist., 10, 539—548. 
Fuller, W. A., see also Carter, R. L. 
Fuller, W. A., see also DeGracie, J. S. 
Geary, R. C. (1942). ‘Inherent relations between random variables.’ Proc. Roy. Irish 
Academy, Ser. A, 47, 63—76. 
Geary, R. C. (1943). ‘Relations between statistics the general and the sampling problem 
when the samples are large.’ Proc. Roy. Irish Academy, Ser. A, 49, 177—196. 
Geary, R. C. (1948). ‘Studies in relations between economic time series.’ J. Roy. Economic 
Society, 10, 1—19. 
Geary, R. C. (1949). ‘Determination of linear relations between systematic parts of 
variables with errors of observation the variances of which are unkonw.’ Hconometrica, 
17, 30—58. 
Geary, R. C. (1953). “Nonlinear functional relationship between two variables when one 
variables is controlled.’ J. Amer. Statist. Assoc., 48, 94—103. 
Geraci, V. J. (1976). ‘Identification of simultaneous equation models with measurement 
error.’ J. Econometrics, 4, 262—283. 
Geraci, V. J. (1977). ‘Estimation of simultaneous equation models with measurement 
error.’ Econometrica, 45, 1243 —1257. 
Gleser, L. J. (1985). ‘A note on G. R. Dolby’s unreplicated ultrastructural model.’ 
Biometrika, 72, 117—124. 
Gibson, W. M., and Jowett, G. H. (1957). “Three-group regression analysis.’ Part I. 
Simple regression analysis. Appl. Statist., 6, 114—122. 
Gini, C. (1921). ‘Sull’ interpolazione di una retta quando i valori della variabile indi- 
pendente sono affetti da errori accidentali.’ Metron, 1, 65—82. 
Gleser, L. J., and Watson, G. S. (1973). ‘Estimation of a linear transformation.’ Bio- 
metrika, 60, 625—534. 
Goldberger, A. S. (1972). ‘Maximum-likelhood estimation of regressions containing un- 
observable independent variables.’ Int. Economic Review, 13 (1), 1—15. 
Grether, D. M., and Maddala, G. S. (1973). ‘Errors in variables and serially correlated 
disturbances in distributed lag models.’ Econometrica, 41, 255—262. 
Griliches, Z. (1974). ‘Errors in variables and other unobservables.’ Hconometrica, 42 
, , 971—998. 
x Griliches, Z., and Ringstad, V. (1970). ‘Errors-in-the-variables bias in nonlinear con- 


texts.’ Econometrica, 38, 368 —370. 
Grubbs, F. E. (1948). ‘On estimating precision of measuring instruments and product 
variability.’ J. Amer. Statist. Ass., 48, 243—264. 


26* 


404 Chapter 3. Models with errors-in-variables 


Guarian, J., and Halperin, M. (1971). ‘A note on estimation in straight line regression 
when both variables are subject to error.’ J. Amer. Statist. Ass., 66, 587—589. 

Gurland, J., see Dorff, M. R. 

Haldane, J. B. 8., see Kermack, K. A. 

Halliday, A. N., and Titterington, D1 M. (1979). ‘On the fitting of parallel isochrons and 
the method of maximum likelihood.’ Manuscript, Isotope Geology Unit, Scottish 
Universities Research and Reactor Centre and Dept. of Statistics, Univ. of Glasgow. 

Halperin, M. (1961). ‘Fitting of straight lines and prediction when both variables are 
subject to error.’ J. Amer. Statist. Assoc., 56, 657—669. 

Halperin, M. (1964). ‘Interval estimation in linear regression when both variables are 
subject to error.’ J. Amer. Statist. Assoc., 59, 1112—1120. 

Halperin, M., see also Guarian, J. 

Hannan, HE. J. (1967). ‘Canonical correlation and multiple equation systems in eco- 
nomics.’ Econometrica, 34, 123—128. 

Healy, J. D. (1980). ‘Maximum-likelihood estimation of a multivariate linear functional 
relationship.’ J. Multivariate Anal., 10, 243—251. 

Hemelrijk, J. (1949a). ‘Construction of a confidence region for a line.’ Nederland Aka- 
demie Wetenschapten, Proceedings, 52, 374—384. 

Hemelrijk, J. (1949b). ‘Construction of a confidence region for a line.’ Indagationes 
Mathematicae, 11, 374—384. 

Hendry, D. F. (1976). “The structure of simultaneous equations estimators.’ J. Econo- 
metrics, 4, 51—88. 

Hendry, D. F., and Srba, F. (1977). “The properties of autoregressive instrumental 
variables estimators in dynamic systems.’ Hconometrica, 45, 969—989. 

Hey, EH. N., and Hey, M. H. (1960). “The statistical estimation of a rectangular hyper- 
bola.’ Biometrics, 16, 606—617. 

Hidiroglou, M. A., see Fuller, W. A. 

Hodges, S. D., and Moore, P. G. (1972). ‘Data uncertainties and least squares regres- 
sion.’ Appl. Statistics, 21, 185—195. 

Hooper, J. W., and Theil, H. (1958). ‘The extension of Wald’s method of fitting straight ° 
lines to multiple regression.’ Rev. Intern. Statist. Institute, 26. 

Hoschel, H.-P. (1978 a). ‘Generalized least-squares estimators of linear functional rela- 
tions with known error-covariance.’ Math. Operationsforsch. Statist., ser. statist., 9, 
9—26. 

Hoéschel, H.-P. (1978b). ‘Least-squares and maximum likelihood estimation of functional 
relations.’ In: Transact. of the 8th Prague Conference on Information Theory, Statistical 
Decision Functions and Random Processes, Vol. A, 305—317. Academia, Prague. 

Hoéschel, H.-P. (1979). ‘A special counterpart to Sard’s theorem and consequences for 
the uniqueness and stochastic convergence of least squares procedures in nonlinear 
data-fitting problems.’ Paper presented on the 4th International Summer School on 
Model Choice, May 1979, Mihihausen, GDR. Preprint 34/1979, Humboldt-Universitat 
Berlin, Sektion Mathematik. 

Hoschel, H. P. (1986). ‘Uniqueness of surfaces and global identifiability in nonlinear 
regression.’ Statistics, 17, 15—24. 

Hoéschel, H.-P., and Penev, S. (1980). “Least-squares curve-fitting for nonlinear models 
with errors in variables and globally convergent Gauss-Newton procedures.’ Uni- 
versité catholique de Louvain. Center for Op. Res. and Econometrics. Discussion 
paper 8025. 

Hotelling, H., and Working, H. (1929). ‘Application of the theory of error to the inter- 
pretation of trends.’ J. Amer. Statist. Assoc., Suppl. 73—79. 

Housner, G. W., see Brennan, J. F. 

Van Houwelingen, J. C. (1978). ‘Linear function relationships with unequal covariance 
matrices.’ Preprint 85, Dept. Math., Univ. of Utrecht. 


3.9. References ‘ 405 
a PE 6ST Ri A a Ol 


Van Houwelingen, J. C., and Schipper, R. M. (1981). ‘The efficiency of a test based on 
the asymptotic distribution of the MLE for a linear functional relationship.’ Math. 
Operationsforsch. Statist., ser. statist., 12, 21—30. 

Hutton, B., see Davies, R. B. 

Izenman, A. J. (1975). ‘Reduced-rank regression for the multivariate linear model.’ 
J. Multivariate Anal., 5, 248—264. 

Jeeves, T. A. (1954). ‘Identification and estimation of linear manifolds in n-dimensions.’ 
Ann. Math. Statist., 25, 714—723. ; 

Jessop, W. N. (1952). ‘One line or two?’ Appl. Statistics, 2, 181—137. \~ 

Johnston, J. (1963). Econometric Methods. McGraw-Hill, New York. 

Jorgenson, D. W., see Brundy, I. M. 

Jowett, G. H., see Gibson, W. M. 

Joyner, M. C., see Seares, F. H. 

Kabe, D. G. (1970). ‘On the exact distributions of the GCL (generalized classical linear) 
estimators in a leading three equation case.’ J. Amer. Statist. Assoc., 65, 182—185. 
Kadane, J. B. (1970). “Testing overidentifying restrictions when the disturbances are 

small.‘ J. Amer. Statist. Assoc., 65, 182—185. 

Kadane, J. B. (1971). ‘Comparison of k-class estimators when the disturbances are small.’ 
Econometrica, 39, 723 —737. 

Kadane, J. B., see also Brown, G. H. 

Kagan, A. M., Linnik, J. W., and Rao, C. R. (1973). Characterization Problems in 
Mathematical Statistics. John Wiley, New York, ch. 10. 

Kapteyn, A., and Wansbeek, T. (1984). ‘Errors in variables: consistent adjusted least 
squares (CALS) estimation.’ Comm. Statist. A — Theory Methods, 18, 1811—1837. 
Karni, H., and Weissman, I. (1974). ‘A consistent estimator of the slope in a regression 

mode] with errors in the variables.’ J. Amer. Statist. Assoc., 69, 211—213. 

Keeping, H. S. (1956). “Note on Wald’s method of fitting a straight lime when both 
variables are subject to error.’ Biometrics, 12, 445—448. 

Kendall, M. G. (1951). ‘Regression, structure and functional relationship, part 1.’ 
Biometrika, 38, 11—25. 

Kendall, M. G. (1952). ‘Regression, structure and functional relationship, part II.’ 
Biometrika, 39, 96—108. 

Kendall, M. G., and Stuart, A. (1961). The Advanced Theory of Statistics, Vol. 2. Griffin, 
London, ch. 29. 

Kermack, K. A., and Haldane, J. B. S. (1950). ‘Organic correlation and allometry.’ 
Biometrika, 37, 30—41. 

Kiefer, J. (1964). ‘Review of Kendall and Stuart’s Advanced theory of statistics IT.’ 
Amn. Math. Statist., 35, 1371 —1380. 

Kiefer, J., and Wolfowitz, J. (1956). ‘Consistency of the maximum likelihood estimator 
in the presence of infinitely many incidental parameters.’ Ann. Math. Statist., 27, 
887—906. 

Koopmans, T. C. (1932). Linear Regression Analysis of Economic Time Series. De Erven, 
F. Bohn N. V., Haarlem; 2nd ed. 1937. 

Koopmans, T. C., and Reiersol, O. (1950). ‘The identification of structural characteris- 
tics.’ Ann. Math. Statist., 21, 165—181. 

Kruskal, W. H. (1953). ‘On the uniqueness of the line of organic correlation.’ Biometrics, 
9, 47—58. 

Kummel, C. H. (1879). ‘Reduction of observation equations which contain more than 
one observed quantity.’ The Analyst (Des Moines) 6, 97—105. 

Kunitomo, N. (1980). ‘Asymptotic expansions of the distribution of estimators in a linear 
functional relationship and simultaneous equations.’ J. Amer. Statist. Assoc., 75 (371), 
393—700. 


406 Chapter 3. Models with errors-in-variables 


Kunitomo, N., and Morimune, K. (1980). ‘Improving the maximum likelihood estimate 
in linear functional relationships for alternative parameter sequences.’ J. Amer. 
Statist. Assoc., 75 (369) 230—237. 

Laha, R. G. (1957). ‘On some characterization problems connected with linear struc- 
tural relations.’ Ann. Math. Staiist., 28, 405—414. 

Laycock, P. J., see Egerton, M. F. 

Leamer, E. E. (1978). ‘Least-squares versus instrumental variables estimation in a 
simple errors-in-variables model.’ Hconometrica, 46, 961—968. 

Levi, M. D. (1973). ‘Errors in the variables bias in the presence of correctly measured 
variables.’ Hconomeirics, 41, 985—986. 

Lindley, D. V. (1947). ‘Regression lines and the linear functional relationship.’ J. 
Roy. Statist. Soc., Suppl. 9, 218—244. 

Lindley, D. V. (1953). ‘Estimation of a functional relationship.’ Biometrika, 40, 47—49. 

Lindley, D. V., see also Hl-Sayyad, G. M. 

Linmik, J. W., see Kagan, A. M. 

Linssen, H. N. (1980). ‘Functional relationships and minimum sum estimation.’ Disser- 
tation, University of Technology Eindhoven. 

Lipton, S., see Dolby, G. R. 

Liviatan, N. (1961). ‘Errors in variables and Engel curve analysis.’ Hconometrica, 29, 
336—362. 

Lord, F. M. (1960). ‘Large-sample covariance analysis when the control variable is 
fallible.’ J. Amer. Statist. Assoc., 5, 397—421. 

Luecke, R. H., see Britt, H. I. 

Lytikens, H. (1977). ‘Schatzung mit Hilfe von Instrumentvariablen.’ Math. Operations- 
forsch. Statist., ser. statist., 8, 173 —198. 

Maasoumi, H. (1978). ‘A modified Stein-like estimator for the reduced form coeffiecient 
of simultaneous equations.’ Hconometrica, 46, 695 —704. 

McCallum, B. T. (1972). “Relative asymptotic bias from errors of emission and measure- 
ment.’ Econometrica, 40 (4) 757—758. 

MacDonald, J. R., and Powell, D. R. (1972). ‘A rapidly convergent iterative method for 
the solution of the generalized nonlinear least-squares problem.’ Comput. J., 15, 
148—155. 

McIntyre, G. A., see Brooks, C. 

Madansky, A. (1959). ‘The fitting of straight lines when both variables are subject to 
error.’ J. Amer. Statist. Assoc., 54, 173—205. 

Maddala, G. 8., see Grether, D. M. 

Mak, T. K. (1983). ‘On Sprent’s generalized least-squares estimator.’ J. Roy. Statist. 
Soc. Ser. B, 45, 380—383. 

Mak, T. K., see Chan, L. K. 

Mak, T. K., see Chan, N. N. 

Malinvaud, E. (1966). Statistical Methods of Econometrics. North Holland Publishing 
Company, Amsterdam, ch. 10. 

Mallison, J. R., and Theobald, C. M. (1978). ‘Comparative calibration, linear structural 
relationships and congeneric measurements.’ Biometrics, 34, 39—45. 

Mariano, Rk. S. (1972). ‘The existence of moments of the ordinary least squares esti- 
mators.’ Econometrica, 40, 643—652. 

Mariano, R. 8. (1973). ‘Approximations of the distributions functions of the ordinary 
least-squares and two-stage least-squares estimators in the case of two included 
endogenous variables.’ Hconometrica, 41, 67—77. 

Mariano, R. S. (1975). ‘Some large-concentration-parameter asymptotics for the k-class 
estimators.’ J. Hconometrics, 8, 171—177. 


3.9. References 407 
Fe i ad DN san Sy Rcd 


Mariano, R.8., and Sawa, T. (1972). ‘The exact finite-sample distributions of the 
limited-information maximum likelihood estimator in the case of two included endo- 
genous variables.’ J. Amer. Statist. Assoc., 67, 159—163. 

Mehra, R. K. (1976). ‘Identification and estimation of the errors-in-variables model 
(EVM) in structural form.’ Math. Programming Study, 5, 191—210. 

Mikhail, W. M., and Sargan, J. D. (1971). ‘A general approximation to the distribution 
of instrumental variables estimates.’ Econometrica, 89, 131—169. 

Moberg, L., and Sundberg, R. (1978). ‘Maximum likelihood estimation of a linear func- 
tional relationship when one of the departure variance is known.’ Scandinavian J. 
Statist., 5, 61—64. 

Moore, P. G., see Hodges, S. D. 

Moran, P. A. P. (1956). ‘A test of significance for an unidentifiable relation.’ J. Roy. 
Statist. Soc., Ser. B, 18, 61—64. 

Moran, P. A. P. (1971). ‘Estimating structural and functional relationships.’ J. Multi- 
variate Anal., 1, 232—255. 

Morimune, K., see Kunitomo, N. 

Mouchart, M., see Florens, J.-P. 

Murray, W., see Gill, P. E. 

| Nagar, A. L. (1959). “The bias and moment matrix of the general k-class estimators of 
the parameters in simultaneous equations.’ Econometrica, 27, 575—595. 

Nair, K. R., and Srivastava, M. P. (1942). ‘On a simple method curve fitting.’ Sankhya, 
6, 121—132. 

Nair, K. R., see also Banerjee, K. 8. 

Neyman, J. (1937). ‘Remarks on a paper by E. C. Rhodes.’ J. Roy. Statist. Soc., 100, 
50—57. 

Neyman, J. (1949). ‘Existence of consistent estimates of linear structural relation of 
two variables.’ Mimeograph, Stat. Lab., Univ. of California, Berkeley, August 23, 1949. 

Neyman, J. (1951). ‘Existence of consistent estimates of the directional parameter in a 
linear structural relation between two variables.’ Ann. Math. Statist., 22, 497—512. 

Neyman, J., and Scott, H. L. (1948). ‘Consistent estimates based on partially consistent 

\ observations.’ Hconometrica, 16, 1—32. 

“~ Neymann, J., and Scott, E. L. (1951). ‘On certain methods of estimating the linear 
structural relation.’ Ann. Math. Statist., 22, 352—351. Correction: Ann. Math. Statist., 
23 (1952), 115. 

Nowak, E. (1975). ‘Konsistente Schitzung samtlicher Parameter einer Zeitreihen- 
Regression bei Fehlern in den Variablen, wenn neben den Daten der Zeitreihe keine 
zusatzlichen Kenntnisse verfiigbar sind.’ Forschungsberichte aus dem Institut fir 
Statistik und Wissenschaftstheorie der Universitat Miinchen, Serie Oe, Nr. 3. 

Nowak, EB. (1976). ‘Identification of error models with time series data and no additional 
knowledge given.’ Forschungsberichte aus dem Institut fiir Statistik und Wissenschafts- 
theorie der Universitat Miinchen, Serie Oe, Nr. 6a. 

Nowak, E. (1977). ‘Identifikation von stochastischen Modellen der Zeitreihenanalyse 
bei Fehlern in den Variablen.’ Forschungsberichte aus dem Institut fir Statistik und 
Wissenschaftstheorie der Universitit Mtinchen, Serie Oe, Nr. 8. 

Nussbaum, M. (1976). ‘Maximum likelihood and least squares estimation of linear func- 
tional relationships’. Math. Operationsforsch. Statist., 7, 23—49. 

Nussbaum, M. (1977). ‘Asymptotic optimality of estimators of a linear functional 
relation if the ratio of the error variances is known.’ Math. Operationsforsch. Statist., 
ser. statist., 8, 173—198. 

Nussbaum, M. (1978a). ‘Schitzung linearer funktioneller Beziehungen.’ Dissertation, 
Akademie der Wissenschaften der DDR, Zentralinstitut fiir Mathematik und Mechanik, 
Berlin. 


408 Chapter 3. Models with errors-in-variables 


Nussbaum, M. (1978b). ‘Asymptotic efficiency of estimators of a multivariate linear 
functional relation.’ Wissenschaftl. Sitzungen zur Stochastik, Akademie der Wissen- 
schaften der DDR, Zentralinstitut fiir Mathematik und Mechanik, WSS 03/78. 

Nussbaum, M. (1984). ‘An asymptotic minimax risk bound for estimation of a linear 
functional relationship.’ J. Muitwariate Anal., 14, 300—314. 

O'Neill, M., and Sinclair, L.G., Smith, F. J. (1969). ‘Polynomial curve. fitting when 
abscissas and ordinates are both subject to error.’ Computer J., 12, 52—56. 

Ord, J. K. (1969). ‘A new approach to the estimation of parameters in linear functional 
relationships.’ Bull. Internat. Statist. Inst., 48, Book 2, 169—171. 

Patefield, W. M. (1976). ‘On the validity of approximate distributions arising in fitting 
a linear functional relationship.’ J. Statist. Comput. Simul., 5, 43—60. 

Patefield, W. M. (1977a). ‘On the information matrix in the linear functional relation- 
ship problem.’ Applied Statistics, 26, 69—70. 

Patefield, W. M. (1977b). ‘Determining the precision of estimators in problems of 
constrained likelihood inference.’ Sankhya, B39, 316—328. 

Patefield, W. M. (1978). ‘The unreplicated ultrastructural relation, large sample pro- 
perties.’ Biometrika, 65, 535—540. 

Pdzman, A. (1984). ‘Nonlinear least-squares-uniqueness versus ambiguity.’ Math. 
Operationsforsch. u. Statist., ser. statist., 15, 323—336. 

Pearson, K. (1901). ‘On lines and planes of closest fit to systems of points in space.’ 
Philosophical Magazine, 2, 559—572. 

Pelzer, H., see Austen, A. EH. W. / 

Penev, S., see Héschel, H.-P. 

Philips, P. C. B. (1977). ‘A general theorem in the theory of asymptotic expansions as 
approximations to the finite sample distribution of econometric estimators.’ Hcono- 
metrica, 45, 1517 —1535. 

Powell, D. R., see MacDonald, J. R. 

Prato, A. A. (1970). ‘Measurement of citrus demands when all variables are subject to 
error.’ J. Amer. Statist. Assoc., 65, 1146—1158. 

Ramage, F. G., see Brown, G. H. 

Rao, C. R. (1966). ‘Characterization of the distribution of random variabies in linear 
structural relations.’ Sankhya, A28, 251—260. 

Rao, C. R., see also Kagan, A. M. 

Ratkowsky, D. A., see Dolby, G. R. 

Reed, A. H., and Wu, G. T. (1977). ‘Estimation of bias in classical linear regression slope 
when the proper model is functional linear regression.’ Comm. Statist.-Theor. Meth., 
A6, 405—416. 

Reiersol, O. (1945). ‘Confluence analysis by means of instrumental sets of variables.’ 
Arkiv for Matematik, Astronomi och Fysik, 32, 1—119. 

Reiersol, O. (1950a). ‘Identifiability of a linear relation between variables which are 
subject to error.’ Econometrica, 18, 575—589. 

Reiersol, O. (1950b). ‘On the identifiability of parameters in Thurstone’s multiple factor 
analysis.’ Psychometrica, 15, 121—149. 

Reiersal, O., see also Koopmans, T. C. 

Rennie, R., and Villegas, C. (1976). ‘Linear relations in time series models, II.’ J. Multi- 
variate Analysis, 6, 46—64. 

Rhodes, H.C. (1927). ‘On lines and planes of closest fit.’ Philosophical Magazine, 7, 
357 — 364. 

Richard, J. F., see Florens, J.-P. 

ichardson, D. H. (1968). ‘The exact distribution of a structural coefficient estimator.’ 
J. Amer. Statist. Assoc., 68, 1214—1226. 

Richardson, D. H., and Wu, D. M. (1970). ‘Least squares and grouping method esti- 
mators in the errors-in-variables model.’ J. Amer. Statist. Assoc., 65, 724—748. 


3.9. References 409 


ee 


Richardson, D. H., and Wu, D. M. (1971). ‘A note on the comparison of ordinary and 
two-stage squares estimators.’ Econometrica, 39, 973—981. 


Ricker, W. H. (1973). ‘Linear regressions in fishery research.’ J. Fishery Research Board, 
Canada, 30, 409 —434. 5 


Ringstad, V., see Griliches, Z 

Robertson, C. A. (1974). ‘Large-sample theory for the linear structural relation.’ Bio- 
metrika, 61, 353—359. 

Robinson, P. M. (1973). ‘Generalized canonical analysis for time series.’ J. Multivariate 
Analysis, 3, 141160. 

Robinson, P. M. (1974). ‘Identification, estimation and large sample theory for regres- 
sions containing unobservable variables.’ Int. Econ. Review, 15, 680—692. 

Robinson, P. M. (1977a). ‘The estimation of a multivariate linear relation.’ J. Multi- 
variate Analysis, 7, 409—425. 

Robinson, P. M. (1977b). ‘Identification, estimation and large sample theory for re- 
gressions containing unobservable variables (reprinted from Robinson (1974)).’ In: 
Latent Variables in Socio-Economic Models (Eds. D. Aigner and A. S. Goldberger). 
North-Holland 103—117. ~ 

Robinson, S. M. (1961). Fitting spheres by the method of least squares. Communications 
of the Assoc. Computing M nelaiery, 4, 491. 

Rubin, H., see Anderson, T. W. 

Sargan, J. D. (1958). ‘The estimation of economic relationships using instrumental 
variables.’ Hconometrica, 26, 393. 

Sargan, J. D. (1959). ‘The estimation of relationships with autocorrelated residuals by 
the use of instrumental variables.’ J. Roy. Statist. Soc., Ser. B, 21, 91—105. 

Sargan, J. D. (1974). “The validity of Nagar’s expansion for the moments of econometric 
estimators.’ Heonometrica, 42, 169—176. 

Sargan, J. D., see Mikhail, W. M. 

Sawa, T. (1969). ‘The exact sampling distribution of ordinary least squares and two- 
stage least squares estimators.’ J. Amer. Statist. Assoc., 64, 923 —937. 

Sawa, T. (1972). ‘Finite sample properties of the k-class estimators.’ Hconometrica, 40 
653 —680. 

Sawa, T., see also Anderson, T. W. 

Sawa, T., see also Mariano, R. 8S. 

Schipper, R. M., see van Houwelingen, J.C. 

Schneeweiss, H. (1971). Okonometrie. Physik-Verlag, Wurzburg, ch. 7. 

Schneeweiss, H. (1975). ‘Error models with known error variance and with given in- 
strumental variable.’ Paper presented at the Third World Congress of the Econometric 
Society, Toronto. 

Schneeweiss, H. (1976). ‘Consistent estimation of a regression with errors in variables.’ 
Metrika, 28, 101—115. 

Schénfeld, P. (1971). Methoden der Okonometrie II. ch. 11, Vahlen, Berlin. 

Schwetlick, H., and Tiller, V. (1985). ‘Numerical methods for estimating parameters in 
nonlinear models with errors in the variables.’ Technometrics, 27, 17—24. 

‘Schwetlick, H., and Tiller, V. (1989), ‘Nonstandard scaling methods in trust regions in 
Gauss-Newton-Methods.’ Siam J. Sci. Statist. Comput. 10, to appear. 

Scott, EB. L. (1950). ‘Note on consistent estimates of the linear structural relation between 
two variables.’ Ann. Math. Statist., 21, 284—288. 

Scott, H. L., see also Neyman, J. 

Seares, F. H. (1944). ‘Regression lines and the functional relation.’ Astrophysical J., 
99/100, 255 —263. 

Seares, F. H. (1945). ‘Regression lines and the functional relation, II.’ Astrophysical J., 
102, 366—376. 


410 Chapter 3. Models with errors-in-variables 


Seares, F. H., and Joyner, M.C. (1945). ‘Relation between color index and effective 
wave length from the observations of Hertzsprung and Vanderlinden.’ Astrophysical 
J., 102, 366—376. 

Sinclair, L. G., see ONeill, M. 

Smith, F. J. (1965). ‘An algorithm for summing orthogonal polynomial series and their 
derivatives with application to curve fitting and interpolation.’ Math. Comp., 19, 
33—36. 

Smith, F. J., see O’ Neill, M. 

Sobel, H., see Carlson, F. D. 

Solari, M. E. (1969). ‘The ‘maximum likelihood solution’ of the problem of estimating 
a linear functional relationship.’ J. Roy. Statist. Soc., Ser. B, $1, 372—375. 

, Southwell, W. H. (1976). ‘Fitting data to nonlinear functions, with uncertainties in all 
measurement variables.’ Comput. J., 19, 69—73. 

Spathe, H. (1967). Algorithmen fir multivariable Ausgleichsmodelle. Oldenburg-Verl., 
Wien. 

Spiegelman, T. (1979). ‘On estimating the slope of straight line when both variables are 
subject to error.’ Ann. Statist., 7, 201 —206. ; 

Sprent, P. (1966). ‘A generalized least squares approach to linear functional relation- 
ships.’ J. Roy. Statist. Soc., Ser. B, 28, 278 —288. : 

Sprent, P. (1969). Models in Regression and Related Topics. Methuen, London, ch. 3.6. 

Sprent, P. (1970). ‘The saddle point of the likelihood surface for a linear functional 
relationship.’ J. Roy. Statist. Soc., Ser. B, 32, 432 —434. 

Sprent, P. (1976). ‘Modified likelihood estimation of a linear relationship.’ Studies in 
Probability and Statistics (Papers in honour of Hdwin Pitman) 109—119. North-Holland, 
Amsterdam. 

Srba, F., see Hendry, D. F. 

Srivastava, M. P., see Nair, K. R. 

Stein, Ch. (1956). “Efficient nonparametric testing and estimation.’ Proc. Third Berkeley 
Symposium Math. Statist. Prob., 187—195, Univ. of California Press, Berkeley. 

Stepanek, A. (1969). ‘A special procedure in estimating parameters in linear structural 
relationships.’ Bull. Intern. Statist. Inst., 48, Book 2, 179—181. 

Stroud, T. W. F. (1972). ‘Comparing conditional means and variances in a regression 
model with measurement errors of known variances.’ J. Amer. Statist. Assoc., 67, 
406—414. 

Stuart, A. S., see Kendall, M. G. 

Sundberg, R., see Moberg, L. 

Swindel, B. F., see Bower, D. R. 

Taylor, J. (1973). “A method of fitting several linear functional relations and of testing 
for differences between them.’ Applied Statistics., 22, 239—248. 

Teicher, H. (1956). ‘Identification of a certain stochastic structure.’ Econometrica, 24, 
172—177. 

Tessier, G. (1948). ‘La relation d’allometrie: sa signification statistique et biologique.’ 
(With discussion). Biometrics, 4, 14—53. 

Theil, H. (1950a). ‘A rank invariant method of linear and polynomial regression 
analysis.’ Nederland Jkademie Wetenschappen Proceedings, Ser. A, 58, 383—392, 
521—525, 13897 —1412. 

Theil, H. (1950b). ‘A rank invariant method of linear and polynomial regression 
analysis.’ Indagationes Mathematicae, 12, 85—91, 173—177, 467—482. 

Therl, H. (1958). Economic Forecast and Policy. North-Holland, Amsterdam, London. 

Theil, H. (1971). Principles of Econometrics. North-Holland, Amsterdam, London. 

Theil, H., see also Hooper, J. W. 

Theil, H., and van Yzeren, J. (1940). ‘The fitting of straight lines if both variables are 
subject to error.’ Ann. Math. Statist. 11, 284—300. 


3.9. References 411 


—————— es 


Theil, H., and van Yzeren, J. (1956). ‘On the efficiency of Wald’s method of fitting 
straight lines.’ Rev. Int. Statist. Inst., 24, 17—26. 
Theobald, C. M., see Mallison, J. R. 


Thomson, G. R. (1916). ‘A hierarchy without a general factor.’ British J. Psychology, 8 
271—281. 


Thomson, G. R. (1919). ‘The proof or disproof of the existence of general ability.’ British 
J. Psychology, 9, 323 —336. 

Tiller, V. (1983). ‘Numerische Methoden zur Parameterschitzung in expliziten und 
impliziten nichtlinearen Modellen mit Fehlern in den Variablen.’ Dissertation. A. 
Sektion Mathematik, Martin-Luther-Universitit Halle— Wittenberg. 

Tiller, V., see Schwetlick, H. 

Tintner, G. (1944). ‘An application of the variate difference, method to multiple re- 
gression.’ Econometrica, 12, 97—113. 

Tintner, G (1950). ‘A test for linear relations between weighted regression coefficients.’ 
J. Roy. Statist. Soc., Ser. B, 12, 273—277. 

Tintner, G. (1952). Econometrics. J. Wiley, New York. 

Titterington, D. M., see Halliday, A. N. 

Tukey, J. W. (1951). ‘Components in regression.’ Biometrics, 7, 33—69. 

Turek, A., see Brooks, CO. 

van Uven, M. J. (1930). ‘Adjustement of NV points (in n-dimensional space) to the best 
linear (n—1)-dimensional space.’ Proc. of the Section of Sciences, 38, 143 —157, 307—326. 

Veitch, L. G., see Fujikoshi, Y. 

Villegas, C. (1961). ‘Maximum likelihood estimation of a linear functional relationship.’ 
Ann. Math. Statist., 32, 1048—1062. 

Villegas, C. (1963). ‘On the least squares estimation of a linear relation.’ Fac. Ingen. 
Agrismens Montevideo Publ. Didact. Inst. Mat. Estadist., 3, 189—203. 

Villegas, C. (1964). ‘Confidence region for a linear relation.’ Ann. Math. Statist., 35, 
780—788. 

Villegas, C. (1966). ‘On the asymptotic efficiency of least squares estimators.’ Ann. 
Math. Statist., 37, 1676—1683. 

Villegas, C. (1969). ’On the least squares estimation of nonlinear relations.’ Ann. Math. 
Statist., 40, 462 —466. 

Villegas, C. (1972). ‘Bayesian inference in linear relations.’ Ann. Math. Statist., 48, 
1767—1791. 

Villegas, C. (1976). ‘Linear relations in time series models. I.’ J. Multivariate Analysis, 6 
31—45. 

Villegas, C., see also Rennie, R. 

Wald, A. (1940). ‘The fitting of straight lines if both variables are subject to error.’ 
Ann. Math. Statist., 11, 284—300. 

Wald, A. (1948). ‘Estimation of a parameter when the number of unknown parameters 
increases indefinitely with the numbers of observations.’ Ann. Math. Statist., 19, 
220—227. 

Wansbeek, T., see Kapteyn, A. 

Ware, J. H. (1972). ‘The fitting of straight lines when both variables are subject to error 
and the ranks of the means are known.’ J. Amer. Statist. Assoc., 67, 891—897. 

Warren, R. D., see Fuller, W. A. 

Watson, G. S., see Carlson, F. D. 

Watson, G. S., see Gleser, L. J. 

Weissman, I., see Karni, EL. 

Wells, C. 8., see Blalock, H. M. 

White, J. K., see Fuller, W. 


29 


412 Chapter 3. Models with errors-in-variables 


Wiley, D. E. (1973). ‘The identification problem for structural equations models with 
unmeasured variables.’ In: Structural Equation Models in the Social Sciences (Eds. 
A. 8. Goldberg and O. D. Duncan). Seminar Press, New York. 

Willassen, Y. (1977). ‘On identifiability of stochastic difference equations with errors 
in variables in relation to identifiability of the classical errors-in-variables model.’ 
Scand. J. Statist., 4, 119—124. 

Willassen, Y. (1979). ‘Two clarifications on the likelihood surface in functional models.’ 
J. Multivariate Analysis, 9, 138—149. 

Williams, H. J. (1955). ‘Significance tests for discriminant functions and linear func- 
tional relationships.’ Biometrika, 42, 360—381. 

Williams, #. J. (1973). ‘The use of likelihood in data analysis. Error approximation 
and accuracy.’ Proc. Sem. Austral. Nat. Univ. Canberra, 1972, pp. 51—66. Univ. 
Queensland Press, St. Lucia. 

Williamson, J. H. (1968). ‘Least-squares fitting of a straight line.’ Canadian J. Phys., 
46, 1845—1847. 

Winsor, C. P. (1946). ‘Which regression?’ Biometrics Bulletin, 2, 101—109. 

Wolfowitz, J. (1952). ‘Consistent estimators of the parameters of a linear structural 
relation.’ Skandinavisk Aktuaristidskrift, 132—1i51. 

Wolfowitz, J. (1953). ‘Estimation by the minimum distance method.’ Ann. Inst. Statist. 
Math., 5, 9—23. 

Wolfowitz, J. (1954a). ‘Estimation of the components of stochastic structures.’ Pro- 
ceedings National Acad. Science of USA, 40, 602—606. 

Wolfowitz, J. (1954b). ‘Estimation of structural parameters when the number of inci- 
dental parameters is unbounded.’ (Abstract). Ann. Math. Statist., 25, 811. 

Wolfowitz, J. (1957). ‘The minimum distance method.’ Ann. Math. Statist., 28, 89—110. 

Wolfowitz, J., see also Kiefer, J. 

Wolter, K. M., see Fuller, W. A. 

Working, H., see Hoteiling, H. 

Wu, D. M. (1973). “Alternative tests of independence between stochastic regressors and 
disturbances.’ Hconometrica, 41, 733—750. 

Wu, D. M., see also Richardson, D. H. 

Wu, G. T., see Reed, A. H. 

York, D. (1966). ‘Least squares fitting of a straight line.’ Canadian J. Phys., 44, 1079 
to 1086. 

York, D. (1967). ‘The best isochron.’ Harth and Planet. Science Letters, 2, 479—482. 

van Yzeren, J., see Theil, H. 

Zellner, A. (1970). ‘Estimation of regression relationships containing unobservable 
variables.’ Int. Econ. Eeview, 11, 411—454. 

Zellner, A. (1971). An Introduction to Bayesian Inierference in Econometrics. J. Wiley, 
New York, ch. V. 

Zucker, L. M. (1947). ‘Evaluation of slope and intercept of straight lines.’ Human 
Biology, 19, 231—259. 


3.9.2 Further references 


Amemiya, T. (1977). ‘The maximum likelihood and the nonlinear three-stage least 
squares estimator in the general nonlinear stimultaneous equation model.’ Hcono- 
metrica, 45, 955—968. 

Andersen, EH. B. (1970a). ‘On Fisher’s lower bound to asymptotic variances in case of 
infinitely many incidental parameters.’ Skandinavisk Aktuaristidskrift, 78 —85. 


: 3.9. References 413 
eee rare ng eas nae PP ee rs oh eee Sp, eb 
Andersen, H. B. (1970b). ‘Asymptotic properties of conditional maximum likelihood 

estimators.’ J. Roy. Statist. Soc., Ser. B, 32, 283—301. 

Anderson, T. W. (1958). An Introduction of Multivariate Statistical Analysis. John Wiley, 
New York. 

Bahadur, R. R. (1967). ‘Rates of convergence of estimates and test statistics.’ Ann. 
Math. Statist., 39, 303 —324. 

Bunke, H., and Bunke, O. (Eds.) (1986). Statistical Inference in Linear Models. John 
Wiley, Chichester. English edition of Humak, K. M. 8. (1977). 

Cochran, W.G. (1968). ‘Errors of measurement in statistics.’ Technometrics, 10, 637—666. 

Cochran, W. G. (1970). ‘Some effects of errors of measurement on multiple correlation.’ 
J. Amer. Statist. Assoc., 65, 22 —34. 

Dieudonné, J. (1976). Grundziige der modernen Analysis. VEB Deutscher Verlag der 
Wissenschaften, Berlin (Translated from French, 2nd edn). 

Hicker, F’. (1966). ‘A multivariate central limit theorem for random linear vector forms.’ 
Ann. Mat. Statist., 37, 1825—1828. 

Fisher, F. M. (1966). The Identification Problem in Econometrics. McGraw-Hill, New 
York. 

Girko, W. L. (1975). Random Matrices. Nauka, Kiew (in Russian). 

“\/Hannan, E. J. (1971). ‘Non-linear time series regression.’ J. Appl. Prob., 8, 767—1780. 
_ Hoadley, B. (1971). ‘Asymptotic properties of maximum likelihood estimators for the 

independent not identically distributed case.’ Ann. Math. Statist., 42, 1977—1991. 
Héschel, H.-P. (1974). ‘A general approach to correlation and linear dependence between 

random vectors with regular or singular covariance matrx.’ Math. Operationsforsch. 

Statist., 5, 487—507. 

Héschel, H.-P. (1976). ‘Uber die Pseudoinverse eines zerlegten positiven linearen Ope- 
rators.’ Mathematische Nachrichten, 74, 167—172. 

Hsu, P. L. (19414). ‘On the limiting distribution of roots of a determinantal equation.’ 
J. London Mathematical Soc., 16, 183—194. 

Hsu, P. L. (1941b). ‘On the problem of rank and the limiting distribution of Fishers 
test function.’ Ann. Hugenics, 11, 39—41. 

Humak, K. M.S. (1977). Statistische Methoden der Modellbildung. Band I: Statistische 
Inferenz fiir lineare Parameter. Akademie-Verlag, Berlin. 

Ibragimow, I. A. and Khasminski, R. S. (1979). Asymptotic Theory of Estimation. Mir, 
Moscow (in Russian). 

James, A. 7. (1954). ‘Normal multivariate analysis and the orthogonal group.’ Ann. 
Math. Statist., 25, 40—75. 

Jennrich, R. I. (1969). ‘Asymptotic properties of nonlinear least squares estimators.’ 
Ann. Math. Statist., 40, 633—643. 

Johnson, N. L., and Kotz, S. (1977). Distributions in Statistics, Continuous Multivariate 
Distributions. John Wiley, New York. 

Kalbfleisch, J. D., and Sprott, D. A. (1970). ‘Application of likelihood methods to models 
involving large numbers of parameters.’ J. Roy. Statist. Soc., Ser. B, 82, 175—208. 

Kato, T. (1966). Perturbation Theory for Linear Operators. Springer-Verlag, Berlin. 

Klebanow, L. B., and Melamed, I. A. (1978). ‘Several notes on Fisher nformation in 
presence of nuisance parameters.’ Math. Operationsforsch. Statist., ser. statist., 9, 
89—90. 

Kleffe, J., and Pincus, R. (1974). ‘Bayes and best quadratic unbiased estimators for 
variance components and heteroscedastic variances in linear models.’ Math. Opera- 
tionsforsch. Statist., 5, 147—159. 

Laha, R. G. (1965). ‘On some problems in canonical correlations.’ Sankhya, 14, 61—66. 

Lawley, D. N., and Maxwell, A. E. (1971). Factor Analysis as a Stastistical Method. 


Butterworths, London. 


sy 


414 Chapter 3. Models with errors-in-variables 


LeCam, L. (1956). ‘On the asymptotic theory of estimation and testing hypotheses.’ 
Proc. Third Berkeley Symp. Math. Statist. Prob., 1, 129—156. Univ. of California 
Press, Berkeley. 

Loéve, M. (1977). Probability Theory I (4th edn). Springer-Verlag, New York. 

Loéve, M. (1978). Probability Theory II (4th edn). Springer-Verlag, New York. 

Lukacs, E., and Laha, R. G. (1964). Applications of Characteristic Functions. Griffin & Co. 
Ltd., London. 

MacRae, F.C. (1974). ‘Matrix derivatives with an application to an adaptive linear 
decision problem.’ Ann. Statist., 2, 337—346. 

Magnus, I. R., and Neudecker, H. (1979). ‘The commutation matrix: some properties 
and applications.’ Ann. Statist., 7, 381—394. 

Michel, R., and Pfanzagl, J. (1971). ‘The accuracy of the normal approximation for 
minimum contrast estimates.’ Z. Wahrscheinlichkeitstheorie verw. Gebiete, 18, 73—84. 

Nolle, G., and Witting, H. (1970). Angewandte Mathematische Statistik. Teubner, Leipzig. 

Nussbaum, M. (1977). ‘Asymptotic efficiency of estimators in the multivariate linear 
model.’ Math. Operationsforsch. Statist., 8, 173—198. 

Okamoto, M. (1973). ‘“Distinctness of the eigenvalues of a quadratic form in a multi- 
variate sample.’ Ann. Statist., 2, 763—765. 

Pfanzagl, J. (1969). ‘On the measurability and consistency of minimum contrast esti- 
mates.’ Metrika, 14, 249—272. 

Pfanzagl, J. (1970). ‘Consistent estimation in the presence of incidental parameters.’ 
Metrika, 15, 141—148. . 

Pfanzagl, J. (1973). ‘The accuracy of the normal approximation for estimates of vector 
parameters.’ Z. Wahrscheinlichkertstheorie verw. Gebrete, 25, 171—198. 

Philippou, Q. N., and Roussas, G. G. (1973). ‘Asymptotic distribution of the likelihood 
function in the independent not identically distributed case.’ Ann. Statist., 1, 454—471. 

Rao, C. R. (1973). Linear Statistical Interference and tis Applications. (2nd edn) John 
Wiley, New York. 

Robinson, P. M. (1972). ‘Non-linear regression for multiple time series.’ J. Appl. Prob., 
9, 758—768. 

Roussas, G. G. (1972). Contiguity of Probability Measures. University Press, Cambridge. 

Rubin, H. (1956). ‘Uniform convergence of random functions with application in sta- 
tistics.” Ann. Math. Statist., 27, 200—203. 

Schwetlick, H. (1979). Numerische Lésung nichtlinearer Gleichungen. VEB Deutscher 
Verlag der Wissenschaften, Berlin. 

Strasser, H. (1973). ‘On Bayes estimates.’ J. Multivariate Analysis, 3, 293—310. 

Sigiura, N. (1976). ‘Asymptotic expansions of the distributions of the latent roots and 
the latent rector of the Wishart and multivariate f-matrices.’ J: Multivariate Ana- 
lysis, 6, 500—525. 

Wald, A. (1949). ‘Note on the consistency of the maximum likelihood estimate.’ Ann. 
Math. Statist., 20, 595—603 (with a note by J. Wolfowitz). 

Weiss, L., and Wolfowitz, J. (1974). Maximum Probability Estimators and Related 
Topics. Lecture Notes in Mathematics Vol. 424. Springer-Verlag, Berlin. 

Witting, H., see Nolle, G. 

Zacks, S. (1971). Theory of Statistial Inference. John Wiley, New York. 

Zehna, P. W. (1966). ‘Invariance of maximum likelihood estimation.’ Ann. Math. 
Statist., 37, 755. 


Al 


A 1.1. 


A AZ. 


A 1.4. 


A 1.5. 


Appendices 


Linear algebra 


Let the mapping A: Nt, — IR* be given by A[A] := (A,[A], ..., 4[A]) 
where A[A] (¢ = 1, ..., &; ALA] S --- S ,[A]) are the k ordered eigen- 
values of A € M,. If Nt, is understood as a linear subspace of R*, 
then A is continuous (Kato, 1966). 


For 1 < k, let 
MY) := {4 € My | Ap aL] < April A]} 
[A 1.1] implies that NY is open in M,. 


Let &,,, be the set of the J-dimensional linear subspaces of IR*. For 
L<k and J€ & 1, J+ € Mey 1), suppose LK(J+) = J+. Then, for 
AE Mesxm, m€ IN, the relation &(A) + J = R* holds if and only 
if r[(J+)’ AJ =k —1. 
For 1 < k let the mapping 7,,,: M!!! > Q,, be given by 
Mr, (A) := {eigenspace to /,_;,,[A], ..., A,[A] of A}. 
Nx,. is continuous on IM!!! (for the topology in &,,, see [A 3.16}). 
For Ap € My; let 
VO [Zi | Ao], Ly, = [—Agi LJ. 
For l<7'= p, A € MM, p-r) let 
GOTT Ward MPO ara Fores con Wreage np eal 
JI = KS), F', := (Ly! J). Then for 
C = ((Cy))iirg € MF, Cu € MZ, we have 
R(C) + J = R?® if and only if 
ACuJ=p—r, C= Fr(J*C(J+)’ + IOnJ’) Fe 
(E = 0,0;3, Ose t= On, — Cri CC 2) Then 7[C 22] = r[C] — (p —7). 


416 


A. Appendices 


A 1.6. 


7; a ar 


A 1.8. 


7 
In addition to the definitions of [A 1.5], let g <7, £ € &y,p-, and 
Doce io Onn Jo= Ry). For k,leN, Ae Ma let 
o(A) := AK(Ly). Then 
(a) £\+ J = RP if and only if there is a (7*, L)'e Lg XK Miia) 
with 
La BiylT ae Ie 


(This holds iff £+ = Lz(£*)+!) 
Here £* and Pys+H are uniquely determined. 

(b) £ + Jo = R? if and only if there is a B € My,.(p-g) with £ =o(B). 
Here B is uniquely determined. 

(c) £ + Jo = R? if and only if there is a Bye Myy (p+ and a 
By € Myx (rq With £ = F,(7+ + Jo(B2)) (= o((By, Be), By = LE, 
where £ is defined according to (a)). 


£ + Jo = R? implies f + J = R?. 


In addition to the definitions of [A 1.5] and[A 1.6], let p —qsin<m, 
DGG ny My € Ie aan) Oo Nor Few et td Reon 


My = (Me € Mex m | ALM | Me]) = £, ALM, ; May) — VW} 


and for L*+ ¢ M4, L€ Mip-yxqr U € Me ms ™m =n —(p —7), 
Vee MU nea lee 
Meo1 uy >= {M, EM em |S Mm, € ean: M, € Worx (pay: 

‘MM, = MU + MV, DM, = 0,5,, 0°", =f: 
Let £ = F;(f+ + J£*) be a representation of £ according [A 1.6a)]. 
Then. J} = W824, holds if ADH)=f*"', L= —FL*, 
V = My, RU’) = W \ A(M)}) is satisfied (here ‘\’ denotes orthogo- 
nal difference). . 


For X € Mux, A € Mixes WE Mz, let C) & Mz be defined by 


Then the matrix X’C,XW is a projection matrix into the space 
XN(A) with respect to the norm |lz\|y = (2/Wz)"?, and B = O,X'Wy 
is a solution of 


lly — XA|%, = min ly — Xp\R,. 
BENW(A) 


(Proof as in Rae, 1972, p. 190). 


A.2. Asymptotics 417 
Se Es A re RT Ee ES Te ae (a 


A19. Let XEM x. 4€ Mixes We M>, and r (=) = k be true. Then 


X[X'WX + AA}! X’W isa projection matrix into the space XV(A) 
concerning the norm ||-||y and 


A[X'WS + A’A}? X' = 0, X'WX[X'WX 4+ A'A}} X'WK = X'WX 
holds. 
Furthermore, 8 = [X’WX + A’A]}! X'W y is a solution of 


lly — XB lly = min |ly — XBliiy 
BEN (A) 
(Proof: Consequence of Bunke and Bunke, (1986, [A 1.29]).) 
A 1.10. Tiny € Moret x ket is defined by 
ATi, Ae A € Mest 


Let Lin —— Lit}: For A E Mr sets B € eh obseop it holds that (A & B) dint 
= Tme(B ® A), Lot = Le) = Les Ten ig = Ter (MacRae, 1974; 
Magnus ae Neudecker, 1979). 


Level = tl 2 + Ijp). Lp is a projection matrix onto the linear 
space {A | 7 €M,} in R?*, r[,] = p(p + 1)/2. Let F, € iene 
Ee, hee 1) Oe ie dis = Inipsiyja- For A = (ei Wes me Aiea (ay, 0: 
yp, Ag9,..-, Ap, ---, Az). Then there isa C € Dee Cuil x (kde-+1)/2) Such sha 


A= CA, Ae Mg. 


References 


Bunke, H., and Bunke O. (Eds.) (1986). Statistical Inference in Linear Models. John 
Wiley, Chichester. 

Kato, T. (1966). Perturbation Theory for Linear Operators. Springer-Verlag, Berlin. 

MacRae, E.C. (1964). ‘Matrix derivatives with an application to an adaptive linear 
decision problem.’ Ann. Statist., 2, 337-4346. 

Magnus, I. R., and Neudecker, H. (1979). ‘The commutation matrix: Some properties 
and applications.’ Ann. Statist., 7, 381—394. 

Rao, C. R. (1972). Linear Statistical Inference and its Applications (2nd edn). John Wiley, 
New York. 


A2  Asymptoties 


A2.1.. For n = 0,1, 2,... let random vectors Z, with values in R* be given. 
If, for the sequence of the distribution functions F(z) = P{Z, < 2} 
of Z,, Fn(2) =>? Fo(z) is true for all continuity points of the limiting 
distribution function Fy), then the sequence of distributions of Z, 


297 Nonlinear Regression 


418 


A 2.2. 


A 2.4. 


A 2.5. 


A. Appendices 


is called weakly convergent towards the distribution of Zp (notation: 
L(Ln) Fae? £(4Zo))- 


n—>0o 


We consider two sequences {Q,,} and {P,} of probability distributions 
Q, and P, over measurable spaces (%,, B,). The sequence {Q,} is 
said to be contiguous to the sequence {P,}, in case P,(B,) ===> 0 with 
B, € B, implies Q,(B,) =>=> 0. If @Q, and P, have the densities ¢,(x) 
and p,(x) with respect to a o-finite measure mu, over (Z,, B,), we use 
the same terminology for the sequences of the corresponding densities 


(Hajek and Sidak, 1967). 


(ist Lemma by Le Cam) 
If Q, and P, have the densities q,(x) and p,(x) with respect to a o- 
finite measure uw, over (Z,, B,), and if with 


n Inle) Ieper y ans 
1A Di Pn(X) 
ale) Ete if PnlX) =a Yn(2) = 0, 
++ 0° if Prl®%) = 0< Yn(X) 


and X, ~ P, for a positive constant b? ST lta( Xr, )) ee (5% b 


is true, then the sequence {Q,} is contiguous to the sequence {P,} 
(Hajek and Sidak, 1967). 
(3rd Lemma by Le Cam) 


Let (Q,} be contiguous to {P,} and with the notation of [A 2.3] let, 
for a sequence of statistics S,, the random vector (S,, 4,)’ under P,, 
asymptotically have the distribution. 


A (a ee 
Me} \G12 —2bUs 


Then, under Q,, S, is asymptotically N (4, + 0,2, 67) distributed 
(Hdjek ahd Sidak, 1967). 


For a density f with finite Fisher-information J(f) we consider the 
densities 


mies Tay ca) and ae po ) = Te 


for « = (a5, sisis's a) with d. =— ae and max (dn, haa d,)? 0) 
as well as N i=1 1Si<n a 


LG)S NG ay an) ee ee ee 0 =O? <0 


A.2. Asymptotics 419 


ce ee i SY 


A 2.6. 


Lae & 


27* 


Furthermore, let , 


(n) 
Qa (x) ‘ 
a if ph” (ar SKIP 
s pi” (x) a) 
Lgl) = “emi —/ ln) 
1 if py (a) = ad (~) = 0, 
oo Tg a) 0 <e ge) 


for x € IR” denote the likelihood ratio and let T” denote the statistics 


% See 
Ty? = —D (dn, — dn) — In f(x) 
i=1 dx 


2=2,-d 
Then 


Pp) 


In L) — +>u Fy 0 


and 


f(a Ll) — > NV es 6, v) 


are true and thus, according to [A 2.3], {gq} is contiguous to {p\”} 
(Hajek and Sidak, 1967). 


Let {P,} and {Q,} be two sequences of probability distributions P,, 
Q, over the measurable spaces (%,,, 8,) with the densities p, and q, 
with respect to a o-finite measure mu, over (X,, B,,). The sequence {q,} 
is assumed to be contiguous to the sequence {p,}. Then to every ¢ > 0 
there is 6 > 0 such that for every sequence {B,} with B, € By, Q,(Bn) 
< «, holds for almost all n whenever F’,(B,) < 6 is satisfied for almost 
all n (Jureckovd, 1969). 


For n = 1,2,... let the measure spaces (%,, By, Mn) be given with 
o-finite measuses ,. For an open subset O of R* let P, = {Pyrs|A€ O} 
be a family of probability distributions over (%,, 8,) and let P, 


Ie ; : 
"? we denote a variant of the 


be dominated by up. BY pag = 
Radon-Nikodym density with respect to u,. For a sequence of matrices 
Sn € Mi, with ||S,|| 7 0, for #, = 0+ 8,h with a vector 


h € IR* and X, ~ Png the likelihood ratio 


PaolXn) it pig(X,) > 0, 
L,(h, 9, X_) = 4 Pno(Xn) 


0 if Pno(Xn) = 0 


is uniquely defined for almost all n. 


420 A. Appendices 
Tf, for all h € IR* and # € © there are a positive definite matrix J(#) 
and sequences of functions 4,(#) and y,(h, #) with 
A,() : (Xn Bn) > (IR*, B*) 
palh, 9) + (Lqy Bn) > (RY, B") 
such that , 

1 
L,(h, 8, X,) = exp {ee 2(9) An(3) (Xn) — Gil) Barta Noe, x) 
with 
L£{A,(9) | 8} Four Ne(Oe, Le) 
and 
wp, (h, 0) 2+ 0 forall ®E€ O, hE R*, 
then the sequence {#,} is called locally asymptotically normal (e.g. 
Ibragimov and Khasminski, 1981). Here IR* denotes the real line 
extended by {—oo, oo}, and 8! the corresponding Borel o-algebra. 

A 2.8. With the notation of [A 2.7] let {P,} be locally asymptotically normal, 
and let the densities p,, be continuous in #. Then there is a Lebesgue 
zero set WO such that for any sequence of estimators d,=6,(X. 
with £{S;,"(9, — 9} ==—> N,(0,, V(d)) the inequality 
V(d) => I-1(9) is fulfilled for 09 € ON J. 

(Bahadur, 1967; Roussas (1972); Ibragimov and Khasminski, 1981). 

A 2.9. Let (X,%, ) be a measure space and {P, | #€ O}, O—R* be a para- 


metric family of probability distributions on (2, 8) dominated by wu. 
Let (Zn, Bn, Png) denote the n-fold product of (%, 6, Ps) with itself, 
and {P,,} be the corresponding sequence of distribution families. Let 


dP = 
© be an open subset of R*, f(-, 3) = ia (-), and © the closure of 9 
Mu 


in the one-point compactification of IR*. Let f(x, ) be defined for 
# € O and continuous in # on @ (for u - almost all x €\%). Let &, denote 
a maximum likelihood estimator of #, based on observations x, ..., Xp, 
ie. d,, is a measurable solution of 


I (xi, 8 lay 335 0) = sup iI f(aj, 0). 
6B i=l 
(a) Under the conditions: 
1. For every y > 0 and % € @ it holds that 
inf sf (Pa, 8) — fw, 5)? du > 0; 


9€0:|9—d|>y & 


A.2. Asymptotics 421 


A 2.10. 


(b 


~~ 


(c) 


2. For all 8 € O 


sup (f(x, 8) — fu?(x, + h))? du => 0; 


X h+9eB,|nli<s 


and 


3. If O is unbounded, then for all 8 ¢€ O 
lim f sup (f2(a, 8) p2(a, 8+ h)) du <1 


dco L 9+heb,||h||>6 


d, is a consistent estimator for 0, (Ibragimov and Khasminski, 
1981). 


The conditions 


4. f(x, 3) is twice continuously differentiable in 0 


f(x, 9) 
5. [eet #) du => OF ks eat du = On scks OE 0, 
se Zi 


6. J(P) := Hy( In f(X, 8) (a5 In f(X, 9)’ exists and is positive 
definite for & € @. 


7. For all & € @ there is a function h(x) and a 6 > 0 with 
2 
up (: In f(x, 3) ) 
o=9 


5€0:||8—d||<6 08, 08; 
and Eyh(X) < oo 

with I(9) := J(d), S, = n-¥2I, ensure local asymptotic normality 

of the sequence {P,} (Bahadur, 1967; Ibragimov and Khasminski, 

1981). 


| < h(x) 


n—>oCo 


efficient in the sense of [A 2.8]) in case the conditions 1, ..., 7 are 
satisfied (Witting and Nolle, 1970). 


£{n(d, — 3) ——+ N(0, J(9)*) holds (i.e. 4, is asymptotically 


Let f : R*IR*° — R?® be a function which is continuously differen- 
tiable on IR’ XIR°, {X,} be a sequence of r-dimensional random vectors, 
and {Zo,} with Zo, = [Xon| Yo,] a sequence of random vectors with 
values in JIR*** with the properties 


£{Yn(X, — Xon)} sour N(O, A) 


and 


n—> Co 


Zon —> Zo = (Xo! Yol- 


422 


A. Appendices 


A 2.11. 


A 2.12. 


Then, with H = 0x,f(Xo, Yo) and B = HAH’ 
£{V/n(f(Xn Yon) — f(Xon» You))} =sa> N(O, B) 
holds. 


Assume given for m = mp, m + 1,...,a sequence of integers {n(m)}, 
a sequence of random matrices {Q,,} with values in It> and a sequence 
of nonrandom matrices An € Mnyim)xp be given. Furthermore, let 
n(m) 


—— « with OS «<1 and m144,,An Goer A EME be ful- 


m= m-—>co 


m 
filled. Let mQ,, be W,(n(m), Py An) distributed (for the definition of the 
noncentral Wishart distribution see Bunke and Bunke, 1986, [A 2.13]). 
Then it follows that 


{mos (2 AS OSULS 5 dk m4nAn)t 


m 
ar Np xp(0, 20(Z ® Z) Ip + 40,(A @ Z)T,). 


m—>oo 


(I, is defined in [A 1.10]). 


Let (2%, 8) be a measurable space and ? = {P3|9€ O}, OCR* bea 
parametric distribution family on (%, 8). Let O be an open subset of 
the IR‘ and denote by @ the closure of @ in the one-point compactifi- 
cation of IR*. Let IR} denote the real line extended by {—oo, oo}, and 
%! the corresponding o-algebra of Borel sets. A family of functions 
g(-, t) : (£, B) — (IR4, B1) for 7 ¢ O is called a family of contrast 
functions for P if Esg(X, t) (KX ~ Py») exists for all 8€ O, r€ © and if 


Eyg(X, 9) < Es(X, t) VO € 0,7 € O,O +t. 


Let X;, 7 = 1,...,2 be independent random variables with distri- 
bution Py», 8 € O, and let (2, 8,,) denote the n-fold product of (%, B) 
with itself. A minimum contrast estimator $9 for 9, based on the obser- 
vations X,,..., X,, is a (X,, 8,) measurable solution of 


n n 

Vg Xj, 9%) = inf Y 9(X;, 9). 

i=1 8€O t=1 

Let F denote a set of families of contrast functions. Under certain 
regularity conditions on F and P (see Pfanzagl, 1969, Michel and 
Pfanzagl, 1971; Pfanzagl, 1973) it holds that: 


(a) for g € F there exists almost surely a minimum contrast esti- 
mator 3% for almost all n, and 


£{ni!2(92 — 8)} <> N;,(0,, V9(9)), 


n—co 


V8) = (V3(8))-? V9) (V5(8))-2, 


A.2. Asymptotics 423 


EE ee ee ee 


A 2.13. 


V4(0) := Eo(d0g(X, 9)’ Aog(X, 9), 


g s SOE e 
V3(0) := —E (( 20, 50, g(X, »)) : 


(b) Let P be dominated by a o-finite measure mw. There exists a 
family of contrast functions f)(-, 7), t € © and a version of the 


dP 
u-density f(-, 3) = ae &€ 0, such that fo(-, 8) = —In f(-, 8), 
lt 


8 € O. If $4 exists, this estimator is a maximum likelihood esti- 
mator. 


(c) If fo(-, -) € F, then it holds that 
Vio) 2 VEO) (VEO), 86 @, ig.eF. 


We consider the linear rank statistic 
n 
Sn = DY (Ca, — Cn) An(R,,), Where Ry,,..., Ry, denote the ranks of 
i=1 


the independent random variables X,,,...,Xn,. For a function 9, 
which is square integrable on the interval [0, 1], let the numbers a,(?) 
fulfil the relation 


1 


f (@n(1 + [en]) + o())? dt =e 0 


0 


and the constants ¢,), ..., Cnn fulfil the condition 


n 
1Sisn = 
Ce Ce 
=1 


For a density fy with finite Fisher information J(f,) let the random 
vector (X,,..., X,)’ have the density 


Gna = [1 foe: —4,,), m= 1,2,..: 
4=1 
and let 


TE NDC see On tegen 8 0 209 00: 
A 


l 


Then S, is asymptotically N(q,, 05) distributed with 


n 1 


Mae = 2g; (Cn, — ¢,] (dn, oe dn) | p(t) vlé, fo) dé, 


i=1 0 


424 


A 2.14. 


A 2.15. 


A. Appendices 


o2 = D (en, — bn)® f (ptt) — #)? at 


d 
pt, f) me Sade: In f(2)|z=7-W> Oa and 


1 . 
@ = | glt) dt, (Hajek and Sidak, 1967). 
0 

Let X,,, ..., Xn» be independently and identically distributed random 
variables with the absolutely continuous distribution function F. In 
[A 2.13] let especially a,(7) = Hg(U), where U denotes the 7th 
order statistics of a sample of size n with respect to the uniform 
distribution R[0, 1] (cf. Bunke and Bunke, 1986, [A 2.35]). For the 
quadratic integrable function from [A 2.13] let 


5 (p(t) — 9) dt > 0 


0 


be true. Then, the condition 


aoa et 
(Cn, Cn) ) (9) 


implies 

E(S, — T,)? 

AS, — Ts 9 (Hajek and Sidak, 1967) 
% (Cn, — €n)? 

t=1 


Let Xy,,...,Xyy be a sample to the Si fo with the finite Fisher 
information I(fy) and let XY) < X®) <...< XW denote the ordered 


sample. For a sequence {ey} of positive Hema bers €y With ¢, ———> 0 and 


m—>oo 
N1/4¢2, ———+ 00 we put my = [N®/4e,?] and ny = [N1/4c3] as well as 


qN 
= 


= 
Myny hy,+my hy, —my) 1— 
ie {|x yt mw) x 4 | 1 


n— 


b= | for 15) S ny and 


N+ Hl 
— [xGreatee) Se adherent 


N 
h 
if Au st < Aha, j= 13k. 


O otherwise. 


A.2. Asymptotics 425 


cE ee a ee eS 


A 2.16. 


Tea 


A 2.18. 


A 2.19. 


A 2.20. 


(Here [gq] denotes for each g € R! the greatest integer that does not 
exceed q) 
G y(t) is ‘a consistent estimator for 


4) 
v(t, fo) = mee In fo(2)|,— ry (0<t<1) 


in the sense that the integral 


j [Bv(t) — lt, fo)? dt 
converges to zero in probability if we have the density 


n 
=[[folw:) (Hajek and Sidak, 1967). 
i=1 
For n = 1,2,... we consider a sequence of tests oy, for the testing 
problem 


H:8€ Og with Oyg— @: 


Pn is called an asymptotic «-test if the condition lim sup Hyg, < « 
for all @ € Og. ae 


(Lemma of Fatou) If {f,} is a sequence of nonnegative p-integrable 
functions with lim inf / fn du < oo, then the function f(x) = lim inf f,(x) 


n—>oco n—>0o 


is u-integrable and 


f fe <lim inf [ j,du — (Halmos, 1950). 
gee 


(Lebesgue theorem) Let {f,} be a sequence of y-integrable functions f, 
with f, —> f and let g be a y-integrable function with the property 
lfn(w)| <S g(x) u-almost-everywhere n= 1,2,... Then f, to, is p- 
integrable and 


Lindusas> [du (Halos, 1950). 


Let & be a Euclidean space and © a compact subset of a Euclidean 
a If g is a bounded and continuous function on % XO and if 
F(x) ——> F(a) is valid for all x € & for distribution functions F,, 


n—>0o 


F,,...and F on &, then the integral J9 (x, 0) dF,(x) tends uni- 


formly in #€ O to the limit f gle, oD) dF(x) (Jennrich, 1969). 
Sf 


(Generalized lemma by Chow) Let {X;} be a sequence of independently 
and identically distributed random vectors with values in IR* and 


426 A. Appendices 


EX, = 0, E\|X;\)2 < ov. For an arbitrary array of nonrandom matrices 
G,, «= 1,...,k,, n = 1, 2, ...) we then have 


kn 
y Gn,Xi 
tet 8S (hows 1066). 
kn 
mY Gall 
References 


Bahadur, R. R. (1967). ‘Rates of convergence of estimates and tests statistics.’ Ann. 
Math. Statist., 38, 303—324. 

Chow, Y.S8. (1966). ‘Some convergence theorems for independent random variables.’ 
Ann. Math. Statist., 37, 428 —493. 

Hajek, J. (1969). A Course in Nonparametric Statistics. Holden Day, San Francisco. 

Hajek, J., and Sidak, Z. (1967). Theory of Rank Tests. Akademia, Prague. 

Halmos, P. R. (1950). Measure Theory. Van Nostrand, Princeton. 

Ibragimov, I. A., and Khasminski, R. Z. (1981). ‘Statistical Estimation: Asymptotic 
Theory.’ Springer, New York. 

Jennrich, R. I. (1969). ‘Asymptotic properties of nonlinear least squares estimators.’ 
Ann. Math. Statist., 40, 633 —643. 

Jureckovd, J. (1969). ‘Asymptotic linearity of a rank statistics in regression parameter.’ 
Ann. Math. Statist., 40, 889—900. 

Michel, R., and Pfanzagl, J. (1971). ‘The accuracy of the normal approximation for 
minimum contrast estimates.’ Z. Wahrscheinlichkettstheorie verw. Gebiete, 18, 73 —84. 

Pfanzagl, J. (1969). ‘On the measurability and consistency of minimum contrast esti- 
mates.’ Metrika, 14, 249—272. 

Pfanzagl, J. (1973). “The accuracy of the normal approximation for estimates of vector 
parameters.’ Z. Wahrscheinlichkettstheorie verw. Gebiete, 25, 171198. 

Roussas, G. G. (1972). Contiguity of Probability Distributions. Cambridge University 
Press, Cambridge. 

Witting, G., and Nolle, G. (1970). Angewandte Mathematische Statistik. Teubner, Leipzig. 


A3 Addenda — 


‘“A3.1. Let g be a real-valued function on Z x O, where O is a compact subset 
of a Euclidean space and X a measurable space. For each @ from @ let 
g(x, #) be a measurable function in x and for each fixed x € Z& let it 


be continuous in # ¢€ @. Then there exists a measurable mapping 
6: X +O with 


g(x, 8) = sup g(x, 3) (Jennrich, 1969). 


bEO 


A 3.2. A sequence {Y,} of random variables is said to be a martingale with 
respect to a sequence {%,} of o-algebras 8, with 8, € B,,, if 


E(V ns | Bn) = Ya; n= 1,2,... 


A.3. Addenda 427 


A 3.3. 


A 3.4. 


A 3.5. 


A 3.6. 


A 3.7. 


Let {X,} be a sequence of random variables and {%,} be a sequence 

of nondecreasing o-algebras with H(X,| 8,4.) =0, n=1,2,... 

(Bo = {®, Q}). Then {¥e = Sx} is a martingale with respect to 
i=1 

{B,} and for n = 1,2,... we have the extended Kolmogorov ine- 

quality 


1 n \ 
P| max |Y,| > ds =5 > D{X,} for each « > 0 (Loéve, 1955). 

1<kSn i=1 
Let Py and A, be two absolutely continuous disjoint families of 
probability distributions with respect to a o-finite measure. If there 
exists a pair (Qo,Qi) € Po X P, such that for the uniformly most 
powerful «-test y* for Qo against Q,, 


Eo,p* = sup Eyy* 
OEP, 

and 

Eo,p* = inf Eo(¢*) 
QcP 


1 


hold, then g* maximizes inf Hyg in the class of all «-tests y for 
QeP, 


Hy: Po against H,: A, 


and @* is the only test with that property if the uniformly most power- 
ful «-test for Q) against Q, is unique (Lehmann, 1959). 


Let a family # of probability distributions P be given with the di- 
stribution functions Ff. Let X,,..., X, be a sample to the distribution 
function F(x — #) with 3 € R!. A set D* of estimators for # is said to 
be essentially complete in the generalized minimax sense if there exists 
for each estimator & an estimator O* € D* with 

sup R(d,d*) < sup R(d, 3). 
PeP,beR* PéP,ocTR* 
Here R(S, 3) denotes the risk function (Zacks, 1971). 


(Generalized theorem by Hunt and Stein) 

With the notation of [A 3.5] let the set D of all estimators S for 0 be 
a separable metric space with respect to a suitably chosen topology. 
Let the loss function L(%, d) be nonnegative and of such a kind that 
the set {d| L(9,d) < t} is compact for each 7 € IR'. Then the set D* 
of the equivariant estimators for # is essentially complete (Zacks, 1971). 


Let {7} = (i, ...,%,) and {j} = (j;, ..-, Jn) be permutations of (1, ..., ”). 
We say that {2 } is better ordered than {j} if for alla and b with a < 6 
and jq < 7p also tg < % follows. 


428 


A 3.8. 


A 3.9. 


A 3.10. 


A 3.11. 


A. Appendices 


In case {7} is better ordered than {j}, then it holds for each m <n 
for the ordered m-tuples (é{m> +++» Timm) 200 (Jims +++» Imm) OF (tr, «++ bm) 
and (91, ---,Jm), respectively, that 


tintadim torall Laks m andall Lams n. 


If {c} is better ordered than {j} and if a, < ... < a, denote real num- 
bers, and h denotes a nondecreasing function, then 


> ayh(r,) = Dar ayh(j,) (Lehmann, 1966). 
k=1 k=1 


Let F be a distribution function and @ the set of continuously diffe- 
rentiable functions y on a compact set € with | p(x) dF(x) > 0. Then 
the functional 


2 
il 
6s J Vet slice Weeescre) oe ale 
ve —f p(x) dF 
is convex in F’ (Huber, 1969). 
Let abs random vector X = (X,,..., X,)’ have the density p(a, ..., x) 


= =I f(z;). Then the vector of the ranks R = (R,,...,R,) of X and 


a. eae X) = (XM,...,X™)’ of the order statistics have the 
following distributions: 


P{R = r} = — for re&, 


where & denotes the set of all permutations of the numbers (1, ..., 2) 
and X“) has the density 


n! [] f(x:) for 
i=1 

0 otherwise 

(Hdjek and Sidak, 1967). 


IW 
| 
= 


q(x) = 


As in [A 3.10], let X have the density p(x =I f(x;). Let the marginal 


density f be symmetric about zero. Then inet a ee of the sign sta- 
tistics sign X, R*, and |X|” are stochastically independent and we 
have 


LP (sign X =v) & for v € V, 


(U is the set of all n-vectors the components of which are either 
+1 or —1); 


A 3.12. 


A 3.13. 


A 3.14. 


A.3. Addenda 429 


2. P{R* =1r} =— for re & (cf. [A 3.10]); 


3. |X|” has the density 


2°! [[ f(x;) Os oS ees wes 
i=1 


g(x) = 


(0) otherwise 


(Hajek and Sidak, 1967). 

With the numbers ¢,, ..., ¢, and a(1), ...,a(n) we consider the statis- 
ric S = > c,a(R;). Here R = (R,,...,R,)’ is a random vector with 
the anfforma distribution over the space & of all permutations of 
(1,...,). Then it follows that 


n 


ES = ie Fo, Seals) 
n 


+= 2s —1 


and 
1 n n ; © 
D{S} = —— > (¢; —@)*? D (a(j) — a)? 
n—1 i=1 j=1 
with 
p- 4) n = 1 n : = 
€é=—>dc¢ and @€=— Ya(z) (Hajek, 1969). 
N j=1 nN i=1 


Let f(z) be an one-dimensional density and {f(~— #)| 0 <¢ IR} the 
related family of densities with the location parameter # € IR!. We 
consider the class J of all densities f with the properties: 


1. f(x) is continuously differentiable in x; 
2. f @f(x) dx < oo; and 
Sane heer 0 


|z|—>oo 


1 : 
Among all densities f € F the density f*(~) =—= e-(”)*" has the 
smallest Fisher information 2x 


2 
LO) = Ve In f(x — ») f(a — 8) dx 
with constant variance o?. 


Let X = (X,,...,X,) be a random matrix with values in N,,.,, where 
the common distribution of the p-dimensional random vectors Xj, ..., 


430 


A 3.15. 


A 3.16. 


A. Appendices 


X,, is absolutely continuous with respect to the np-dimensional Lebes- 
gue measure. Let A be a real symmetric (n X)-matrix of rank r. 
Then, for the random matrix S := XAX’ the following statement 
holds almost surely: we have 7[S] = min (p, 7) and the nonvanishing 
eigenvalues of S are different (Okamoto, 1973). 


Let A € We, a > 0 and 

U2’) := a log det [2] 4+ tr[A2™?]. 
Then 

s = Ala = arg min I(2) 


LSEMF 


(ef. Rao, 1973, 8.a.5.8—10). 


Let &,, (for the definition of 2, , see [A 1.3]!) be endowed in a canoni- 
cal way with the structure of a compact differentiable manifold 
(Grassmann manifold; see Dieudonné, 1976,'ch. 16). Then: 


(a) For 1<k the mapping @: Mu_yxi > eu O(R) = ALL B)) 
is a homeomorphism of IM,_;),,; onto the open subset e(My_1) x7) 
of Qe 


(b) For a sequence of random variables {£m}mexy With values in 
eid & ns Lo, £y € &, holds if and only if there exists a se- 
quence of random variables {Lm}mey With values in Mi, and 
Do € Mi, such that AR(Ly) = Lm, MEN, A(Ly) = fo, and 
Lyn => Lp. 

(c) For O0<1l<k the mapping 0: & > & x1, Off) := £1 is a 
homeomorphism. 


(a) The mapping uw: Mi. X Mer > Ler, (A, £) := ALF (where AS 
denotes the image of £ under the linear mapping A) is continuous. 


(e) There exists a uniquely determined measure « on the o-algebra $ 
of the Borel sets of &,,, for which o(&,,) = 1, «(B) = a(OB) for 
each B € $ and each orthogonal matrix O € Nti,., (Haar measure; 
see Dieudonné, 1975, ch. 14). 


(f) Let J be an arbitrary element of &,,,. Then the set {f € &) | £ 
+ J = R"} is open in &, (£ + J denotes the linear hull of the 
set £ uJ in R*). 


(g) For a given M € Mi, the set {f € &,| R(M) S F} is a connected 
subset of ,, ;. 


A.3. Addenda ‘ 431 


A 3.17. 


A 3.18. 


A 3.19. 


In addition to the definitions of [A 1.5]—[A 1.7], let 
My = {2 € Mare |Z =((Zy))TBS. ((Zi))iz7 € MP, 


j=1,2 
FZ, S28 Me : ((Zy))e22 = 21+ 22, AT) — F,-HZ = N, 
My {= {M © Woon | R(M) S eis JVM re (Qin, Dr os 
Let a mapping 


fi: My} ME X Moxa X MZ 
£:L8y pg L+I=R? 


be defined as follows: if, for a random vector 2 and J € M%, 

% ~ Nn +p(On+p, ) holds, then let 

(a) SSH Say ee aes) 

for w= (1, Onxr) 20s © 2= (Op x0, In) Zs 

ee te, ep hw , De ND Hire ao ae | Pay 

The mapping f is injective and 

f(MZ) = MF x ME K (Le ME | R(L) = J}. 

Let x = (a@,,...,%,) be a k-dimensional random vector with indepen- 
dent components x; = yw; + &,7 = 1,...,k, # |lax||* << o. 

Leta: (ig, a) Ee 


er et Dia Ore. 01 

@, := (Ee?), Ui Sind peste d ne Mss Es}, 
Then for 

A = ((aj) ith k € My 


it holds that 


k 
Dex' Ax = ¥ ak(p; — 30) + 40’ diag [A] Au 


7=1 


+ 2sp[XALA] 4+ 4u’AL Ay 
(diag [A] = Diag [a1, ..., Gx])- 


Let « = w+ bea k-dimensional random vector with £ |lx||* < oo, 
Ha = p. Let 


W :— Hes’ © &é’, ® := He' & &&’, 2 Dee 
Then it holds that 
Daw’ = ¥ — SE’ + 2p’ © OG) LM, + 22,(u @ ©) + 40 (up’ @ 2) Le 


432 A. Appendices 


(1, as in [A 1.10]). If # is normally distributed, then 
UF ES OS id) Pe oP = Oe ae 


A 3.20. A random variable X is said to be double-exponentially distributed with 
the parameters « and f if it has the density 


Oe seal 


with —co <a < +cooand0 <f<o. 
A random variable X has a logistic distribution with the parameters 
«x and £ if its distribution function has the form 


Fy(x) = (1 + exp [—fa — a])* 
with —co << « < +o and f > 0. 


References 


Dieudonné, J. (1975, 1976). Grundziige der modernen Analysis; 2., 3. Bd., VEB Deutscher 
Verlag der Wissenschaften, Berlin. 

Hajek, J. (1969). A Course in Nonparametric Statistics. Holden-Day, San Francisco. 

Hajek, J., and Sidak, Z. (1967). Theory of Rank Tests. Academia, Prague. 

Huber, P. J. (1969) Théorie de Vinférence statistique robuste. Les presses de l’université 
de Montréal, Montréal. 

Jennrich, R. I. (1969). ‘Asymptotic properties of nonlinear least squares estimators.’ 
Ann. Math. Statist., 40, 633—643. 

Kagan, A. M., Linnik, Ju. V., and Rao, C. R. (1973). Characterization Theorems in 
Mathematical Statistics., John Wiley, New York. 

Lehmann, EH. L. (1959). Testing Statistical Hypotheses. John Wiley, New York. 

Lehmann, EH. L. (1966). ‘Some concepts of dependence.’ Ann. Math. Siatist., 37, 1137 
to 1153. 

Loéve, M. (1955). Probability Theory. Van Nostrand, Princeton. 

Okamoto, M. (1973). ‘“Distinctness of the eigenvalues of a quadratic form in a multi- 
variate sample.’ Ann. Statist., 1, 763—765. 

Rao, C. R. (1973). Lineare Statistische Methoden und thre Anwendung. Akademie-Verlag, 
Berlin. 

Lacks, S. (1971). The Theory of Statistical Inference. John Wiley, New York. 


A4 Notation and terminology 


A4.1 Abbreviations 


BAN-estimation sequence best asymptotic normally distributed estimation 
sequence 

BILUE best inhomogeneous linear unbiased estimator 

BLUE best linear unbiased estimator 


A.4. Notations 433 
ida ge se ee eo ee 
BUE best unbiased estimator 
CIVE canonical instrumental variable estimator 
EVM _ errors-in-variables model 
GLSE _ generalized least squares estimator 
IV instrumental variable 
IVE instrumental variable estimator 
LIFU | linear functional relation 
LIFU+t LIFU with nonrandom unobservable variables 
LIFU- LIFU with random unobservable variables 
LIML MLE with limited information 
LRT likelihood ratio test 
LSE least squares estimator 
MCE minimum contrast estimator 
MLE = _ maximum likelihood estimator 
MLS solution of the likelihood equation 
MSE mean square error 
OLSE ordinary least squares estimator 
ORLSE orthogonal least squares estimator 
2SLS-estimator two-stage least squares estimator 
WILSA weighted inadequate least squares approximation 
WILSE weighted inadequate least squares estimator 
WLSE weighted least squares estimator 


A4.2 Vectors, matrices, spaces 


((mi;)), ((mj) ryt matrix with elements m;; 


(m,,---, 7M) matrix with columns m,, ..., m, 


((11;;)), Ms, Mo. Bes matrix with submatrices M ;; 


A@®B = ((a;B)) for A = ((a)) 
(Wii we Ls ais M,), Min) = (Mi; ae M,,)' for matrices M; 
(¢ = 1,..., m) with the same number of columns 


(M,)i=*-" = (M,}...1M,) for matrices M,(¢=1,...,n) with the same 
number of rows 


7[M] rank of the matrix M 
Maxk Set of all (n X &)-matrices with real elements 


axe (= {ME Maxe | AM) = 7} 
Mn set of symmetric matrices in Mr,» 
M= set of positive semidefinite matrices in M,, 


28 Nonlinear Regression 


434 A. Appendices i é 


\ 
WM set of positive definite matrices in Nc, 


IR” n-dimensional Euclidean space 


R>, R= set of positive and nonnegative real numbers, respectively 


N set of all natural numbers 
My set of all (x X p)-matrices with columns in a subset £ of IR” 
A(X) = {XP|B € R*} with X € M,., 


MX) = Marx) 
N({X) .= {pe R*| XP = 0,} with X May, 


L(x, ...,%,%) subspace of IR" generated by the column vectors 2; € IR” 


(Ges Te. .5K) 
M’ transpose of the matrix WM 
M = (M,)io1.....0 if M = (M,)i-*-* with M, € Myr (6 = 1,..., 2) 
mM = (Minh, if M = (Miica,....n With Wy € Mr (@= 1... 2) 
M- generalized inverse of M 
M+ Moore generalized inverse of M 


tr[M]_ trace of M 

det [M7] determinant of J 

A{M] ith eigenvalue of 

Amax{M] largest eigenvalue of M (= 4,[M]) 
Aminf M] smallest eigenvalue of M (= 4,[J1}) 


Diag (21;)i-3,...,n, Diag [,..., M,] block diagonal matrix with the diagonal 
matrices M; ‘ 


we, = M'AM with M € Myym and A € Mn 
|? = = ||, with A = £ 
ipe projection matrix (projector) onto the linear subspace £ € JR” with 


respect to the norm ||-||,, i-e. |lz — P$a|| = min |lz — yl|, (x € IR") 
yea 


I,,2 unity matrix of order n Xn 


ioe = Pin (or, if no confusion is possible, = P#) 

Py =P R(M) 

£, \ £2 orthogonal difference between /, and /, 

Nie _ orthogonal complement of £ 

Lt matrix L with index ‘ortho’, mostly with A(L+) = (A(L))+ 
: (1k TCS RE 


07,0 null-vector in JR" 


A.4. Notations 435 


Onxe, O null-matrix in M,,, 
ee Ay" WALD COM C  oa' Fee Migrma Aie, Wee, 


Sxy = S¥y (or, if no confusion is possible, = 4) 
Sx Ny 

Qxy.z = Sxz87'Szy = XPzY' 

Qx.z = Orx.z 

Sxyzg = Sxy —Qxv.2 = XPisY' 

Sx.z = Sxx.z 

Qxy.uv = XPp,yY' = Sxy~SpSov.v 

J = Up-q 04x (p-a] 

o ie [O.n-2) xa! I) 

Lz = ERS B] 

L; sa ae oe a] 

L_,, &- set of r-dimensional subspaces of R? (p = r) 
Me, union of all Q_, withg<r 
| Qe set of the /-dimensional subspaces of IR* 

0 map from Wy. (pq) > ap—q With o(B) = R(Lz) 


A4.3 Sets and functions 


Sq = (LE Sypg | LE Moxcp_p: if A(L) = F, then rf J'L] = g} 
ASB Aisa subset of # 

A-— B ACA, and there isana€é # witha¢ A 

A — # difference between sets 

AXA = {(a,b)|ac A, bE B 


A” =AXAX->+ XA (n-fold product of 4) 
BB” class of Borel sets in IR” 

By class of Borel subsets of the (Borel) set 4 
Aint set of all inner points of the set 4 


(X, 2%) measurable space 

f: 2% —-Y fisamap from Z inY 

{”, f; | jth component of the vector-valued function f, in particular jth com- 
ponent of a vector 

Ass partial derivative of the function f with respect to the 7th component 

ilo—o of O at the point & = % 


99 Nonlinear Regression 


436 A. Appendices 


0" f 
(or 


Outs Onf(Hr)> matrix of partial derivatives of f with respect 
Om(1) filma) --- O,(p) fi(ér) to w at the point w= m (f: Rt > Ry, 


) matrix of second partial derivatives of the function f with 
Outs respect to the components of # at the point } = 8 


: Hy & Rp Sr ie = [oO oP Py) 
Ou(1) falta) --- Ou(p) fa(ur)/ / 


; 1 
k,(u) ae a lle— pl/>- for given 2 € M= 
{vj}, {vi}iexy Sequence of the 2; 
A = x; — v_,, where 2;, x;_, are elements of a sequence {xj} icq 
Ax; = x; — X, where x; is an element of a sequence {2} cen 


min f(z), min {f(z)| z € 4} minimum of the function f over 4 
A f(z), max {f(z) |2z€ 4} maximum of the function f over 4 
ANS inf {f(z) | z€ 4} infimum of the function f over 4 

act f(z), sup {f(z) |z€ A} supremum of the function f over 4 
<7 min f(z) = 2* € A with f(z*) = min f(z) 


ze ZEA 
f(z)=> min! minimize f! 
if recA 
IDC eee reais (indicator function of the set 4) 
0, otherwise 
n 2 
TUN eet (n= Sw?) with 
‘ t=1 
Wi (W, 2.5, 0) © IR” anc 
Ys (1, Op Yn) eqk* 
|A| number of elements of 4 
[q] largest integer which is not greater than q € IR} 


A4.4 Random variables and models 


PY, f{y} distribution of y 

f{y | 8} distribution of y for a given parameter # 
prly=y, P*v conditional distribution of # given y = y 
y~P_ yis distributed according to P 

yOQP ywr~P for some PEP 

Ey = i yP¥(dy) (expectation of y) 


A.4. Notations 437 


Epy = { yP(dy) (expectation of y if y ~ P) 
E o =H PJ 
Boy = f y —= exp {—y?/2} dy 
27 

TR? 
Dy = Ely — Ey) (y — Hy)’ (covariance matrix of the random vector y) 
Dsy = Es(y — Egy) (y — Esy)’ (covariance matrix of the random vector 
Cov (#,y) = H(a@ — Ha) (y — Ey)’ (covariance matrix between « and y) 


E(T | %) conditional expectation of T with respect to $ 
K(T | y) conditional expectation of T under y 
°(L, kn = n> SY) wlU(a4) kat) 
t=1 
for 1,4: % > R}, w := {fw |¢ = 1,<..,2; 2 € N} 
= (is i) pag 


for | = (h,...,,)": 2 +R? and k = (h,...,h,)' : L > R! 
“(1,k) —lim (I, b)p 


“tle = “, Ln 
Sey a= LY) 


"ly —U, =n Swirly, — Ua)? 


t=1 
for y = (¥1,--- Yn)’ € IR® andl: Y — R! 
Py characteristic function of the random vector y 
L,(#) likelihood or log-likelihood function for the observation y 
St, 
Sty model as set of structures 
(= {Sty | x € I7}) 


structure with parameter x 


A4.5 Distributions and measures 


N,(u, £), N(u, X) p-dimensional normal distribution with expectation mw and 
covariance matrix 2’ 

Nap oe, x &) A) ag Np(M, 2 @ A) 

@ distribution function of (0, 1) 

W(p, £) I-dimensional Wishart distribution with p degrees of freedom and 
expectational matrix 2’ 

“ central y? distribution with p degrees of freedom 

ees (1 — x) quantile (upper «-point) of the distribution 7; 


29* 


438 A. Appendices» 


F,., central F-distribution with p, and p, degress of freedom 


F y:p,,p, (1 — «) quantile (upper «-point) of the distribution Fp, », 


n 
v1 X%, X ¥; product measure of the measures 7, 7. and 7, ..., ¥,, respectively 


w=1 
n 


yr = X », with y, =» (1 = Lema} 
i=1 
Ly Lebesgue measure on (IR!, $+) 
y<yp vis absolutely continuous with respect to wu 
/jo,1] restriction of the measure 4 to Byo,1; 


R[O, 1] rectangular distribution on [0, 1] 


A4.6 Convergence 


An —>—> 0,4, > the sequence {a,}nex_ converges to @ 


n—>0o 
P re 
— convergence in probability 
ay convergence almost surely 


P;, —P the sequence of probability measures {P;};-y converges in distribution 
to the probability measure P 

o(.) a, = 0(6,) if a,/b, => 0 

O(.) Gy, = O(b,) if a,/b, is bounded for all n 

Op,(-) n= Op,(Yn) if 2a/Yn 2 0 


Op(.) 2% = Op (y,) if for any 7 > 0 there exist some q < co and some 
Mo € IN with Pyf{|%a/Yn| SQ} 21 —7 foralln = ny 


lim sup, lim inf limit supremum, limit infimum, respectively 


A 4.7 Sample functions 


(a) Simple classified data (x; €« R®, 7 = 1,..., n) 


Xn) = (%i)ini,...,n 
n 

x, = 25; 

i=1 
i = a./n (sample mean of 2;,)) 
x = (x, —&.,..., %, — @.) (sample residuals) 
LD = S; (sample covariance matrix) 
D 2e8. 


PP 
RY-2 


A.4. Notations 439 


ms 

vi, => Liz 
j=1 
n 

xj = ay 
i=1 


8 

| 
Ms 
Mes 
& 


i=1 1=1 

%; = a; /m; 

a == / 10 

Xi. = (In, © V4, +++) Im, © &n) 

Xi. SNC Ry aaa ed 

X(n) = [%,, +--+ Xn 

ise = Sg; with & = x — Z%_ (sample covariance) 

Wx Se nhOn hp ae p 

W, = 8, for %; = x; — Z;, (sum of squares in the classes) 
B; = S; for %; = %;, — %, (sum of squares between the classes) 
ahs See) ia, CER 

We = W,, i vi, ¢ IR 


b, = Bz, if x; ¢€ R! 


Author index 


Acton, 242 

Adcock, 240 

Adichie, 139, 140 

Agarwal, 89, 116 

Agha, 23 

_ Ahlberg, 89 

Aigner, 246 

Akahira, 46 

Amari, 45 

Andersen, 317, 325, 360 

Anderson, 8. L., 135 

Anderson, T. W., 210, 214, 227, 228, 241, 
242, 243, 246, 250, 254, 266, 267, 280, 
285, 361, 362, 365, 371, 372, 373, 384, 
386, 388 

Andrews, 135 

Anscombe, 201 


Bacon, 75, 104 

Bahadur, 49, 420, 421 

Balakrishnan, 119 

Banerjee, 245 

Bard, 22, 210 

Barham, 26 

Barnard, 75 

Barnett, 31, 216, 242, 246, 252, 254, 255, 
370 

Bartlett, 244, 245, 246, 281 

Basman, 242 

Bassett, 138 

Basu, 246 

Bates, 45 

Beale, 72 

Beran, 109, 189, 191 

Berkson, 246 

Bickel, 134, 137 

Birch, 246, 255 

Birgé, L., 124 

Borodjuk, 75 


Box, G. E.P., 73, 134 

Box, M. J., 25 

Brennan, 244, 246 

Britt, 224, 244, 275, 395 

Broemeling, 103 

Brown, G. H., 216, 242, 243 

Brown, R. L., 76, 87, 246 

Brundy, 359 

Bunke, H., 19, 21, 22, 27, 28, 29, 31, 40, 41, 
47, 48, 50, 83, 90, 92, 134, 136, 137, 166, 
171, 186, 192, 214, 218, 220, 227, 284, 
298, 301, 307, 318, 334, 357, 358, 386, 
387, 388, 417, 422, 424 

Bunke, O., 19, 21, 22, 27, 28, 31, 40, 47, 48, 
50, 83, 90, 106, 134, 136, 137, 166, 171, 
186, 192, 214, 218, 220, 227, 284, 298, 
301, 307, 318, 334, 357, 358, 386, 387, 
388, 417, 422, 424 

Buse, 89, 91 


Carleton, 75, 104 
Carlson, 246 

Carroll, 138 

Casson, 266 

Chao, 115 

Chan, L. K., 216, 242, 252, 253, 255 
Chan, N. N., 243 
Chanda, 59 

Cheng, 115 

Chernoff, 137 
Chibisov, 46 

Choudry, 23 

Chow, G. C., 83 
Chow, Y.S8., 196, 426 
Clutton-Brock, 243 
Collomb, 115, 124 
Cook, 243 

Copas, 241 

Coutie, 73 


_ Author index 


441 


ae ee 


Cox, D. D., 116; 117, 123, 124 

Cox; D. RR. 22 

Cox, N. R., 210, 216, 242, 246, 250, 251 
Craven, 116, 122, 124 

Creasy, 246 


Dathe, 89 

Davis, 121 

de Gracie, 245 

Deming, 243, 396 

Dent, 241 

Dieudonné, 259, 430 

Dionne, 190 

Dolby, 210, 216, 242, 243, 244, 250, 253, 
254, 257, 275, 278, 364, 369, 370, 396 

Dorff, 244, 245, 246, 283, 353, 361 

Drane, 26 

Draper, 28 

Drion, 245 

Dunicz, 75 

Durbin, '76, 87 

Dutter, 150, 152 


Eder, 73, 74 
Egerton, 277, 394 
Hicker, 40 
El-Sayyad, 246 
Ertel, 89, 105 
Evans, 76, 87 


Fair, 75 

Farebrother, 285 

Farley, 76, 95 

Feder, 76, 95, 102 

Fedorov, 31, 246 

Fedotov, 109 

Feldstein, 245, 282, 362 

Fereday, 246 

Ferreira, 103 

Fisher, 272, 386 

Florens, 241, 246, 267 

Forsythe, 393 

Fowlkes, 89, 105 

Freeman, 244, 278, 364, 370 

Frisch, 245 

Fujikoski, 386 

Fuller, 75, 76, 92, 211, 243, 244, 245, 285, 
286, 291, 361, 374, 378, 381, 384, 397 


Gallant, 69, 72, 75, 76, 92 
Garbade, 88 

Gasser, 114, 115, 121, 123 
Gastwirth, 137 


Geary, 242, 245, 246 
Geertsema, 196, 201 
Geman, 124 

Ghosh, 196, 199, 201 
Gibson, 245 

Gini, 245 

Girko, 272 

Gleser, 196, 242, 265, 266 
Goldberger, 23, 29, 314 
Goldfeld, 23, 73, 76, 103 
Golubev, 120, 121, 123 
Griliches, 243 
Grossmann, 45 

Guarian, 245 

Gurland, 244, 245, 246, 283, 353, 361 
Guthery, 81 


Hackl, 88 

Hajek, 107, 169, 170, 173, 177, 187, 189, 
190, 418, 419, 424, 425, 428, 429 

Halmos, 425 

Hall, 124 

Hampel, 135 

Halperin, 245, 246 

Halpern, 105 

Hamilton, 45 

Hannan, 31, 242, 369 

Hardle, 124, 125 

Henschke, 22 

Hey, E. N., 243 

Hey, M. H., 243 

Hinich 76, 95 

Hinkley, 76, 91, 94, 95, 102 

Hoadley, 317 

Hodges, 139, 140, 143, 179, 184, 189 

Hoffmann, 28 

Holbert, 103 

Hoschel, 210, 237, 243, 244, 258, 262, 264, 
271, 275, 384, 396 

Housner, 244, 246 

Houwelingen, van, 362 

Hsu, 386 

Huber, 45, 135, 138, 139, 141, 143, 150. 
152, 160, 428 

Hudson, 76, 91 

Huéskova, 191, 202 

Hwang, 124 


Ibragimov, 107, 108, 111ff., 317, 360, 420, 
421 
Izenman, 314 


Jaeckel, 139, 140, 178 
Jaffee, 75 


442 


Author index 


James, 272 

Jeeves, 235, 245 

Jennrich, 30, 42, 44, 425, 426 

Johns, 137 ; 

Joknston, 210 

Jorgenson, 359 

Jowett, 245 

Jung, 137 

Jupp, 89, 92 

Juretkovd, 138, 139, 140, 157, 174, 177, 
182, 202, 419 


Kadane, 243, 373 

Kagan, 134, 234 

Kalbfleisch, 362 

Kato, 415 

Kendall, 210, 219, 231, 239, 245, 246, 252, 
255, 265, 371, 388 

Khasminski, 107, 108, 111ff., 317, 360, 
420, 421 

Kiefer, 241, 246, 255, 283 

Klebanov, 362 

Koenker, 138 

Koopmans, 241, 245 

Koryakin, 115 

Koul, 139, 179 

Kraft, 140, 179 

Kruskal, 48 

Kummel, 240, 245 


Laha, 235, 271 

Lauter, 44 

Lawton, 26 

Laycock, 274, 394 

Leadbetter, 121 

LeCam, 362 

Lehmann, 135, 137, 139, 140, 143, 179, 184, 
189, 198, 427, 428 

Lezki, 75 

In, 124. 

Liero, 124 

Inm, 89, 91 

Inn, 115 

Lindley, 219, 241, 245, 246, 253 

Linnik, 134, 234 , 

Lipton, 244, 278, 370 

Loéve, 240, 383, 384, 427 

Lord, 245 

Tnuecke, 224, 244, 275, 395 

Lukacs, 235 

Lytikens, 246 


MacDonald, 244, 398 

MacNeill, 88 

MacRae, 417 

Madansky, 210, 215, 239, 245, 246, 254, 
255, 280, 283, 353, 361 

Magnus, 417 

Mak, 216, 242, 252, 253, 255 

Makowski, 109 

Malinvaud, 30, 31, 209, 210, 219, 239, 279, 
362 

Mariano, 242, 243, 285, 371, 373 

Marron, 125 

McGee, 75, 104 

McGilchrist, 23 

Melamed, 362 

Michel, 46, 318, 355, 422 

Mikhail, 242 

Millar, 109 

Moberg, 241, 254 

Moran, 210, 246 

Mouchart, 241, 246 

Miller, P. H., 89 

Miller, H.-G., 114, 115, 121, 123 


Nagar, 243 

Nair, 245 

Nelder, 23 

Nemirovski, 124. 

Neudecker, 417 

Neyman, 241, 245, 283, 310, 325, 361 

Nilson, 89 

Nolle, 258, 365, 421 

Nowak, 246 

Nussbaum, 122, 123, 240, 242, 245, 262, 
266, 317 


Okamoto, 430 
O’ Neill, 214, 244, 392, 393, 396 


Park, 89 

Patefield, 216, 242, 362, 370, 373 

Paul, 75 

Pézman, 258 

Pearson, 134, 240, 245 

Penev, 244, 396 

Pfanzagl, 45, 258, 318, 325, 353, 354, 355, 
422 

Philippou, 317, 360 

Pinsker, 118, 121, 122, 124 

Poirier, 75, 76, 89, 91 

Polyak, 124 


Author index 


443 


tae a ee 


Powell, 244, 398 
Prakasa Rao, 124 
Priestley, 115 


Quandt, 23, 73, 76, 83, 84, 85, 103 


Ragozin, 117 

Ramage, 243 

Ramsey, 85, 103 

Rao, 134, 218, 230, 234, 235, 245, 260, 275, 
416, 430 

Rasch, 23 

Retersol, 210, 233, 235, 244, 245, 246 

Relles, 138 

Rice, 117, 125 

Richard, 241, 244 

Richardson, 242, 246 

Ringstad, 243 

Robbins, 196 

Robertson, 243, 361 

Robinson, P. M., 211, 219, 231, 242, 243, 
246, 271, 287, 289, 291, 314, 355, 369, 
375, 376 

Robinson, S. M., 243 

Robinson, 76 

Rosenblatt, 117 

Roussas, 317, 360, 420 

Roy, 84 

Rubin, 241, 242, 254, 267, 280, 310 

Ruppert, 138 


Sacks, 123 

Saleh, 23 

Sargan, 242, 245 

Sawa, 243, 285, 371 

Schipper, 362 

Schmidt, P., 83 

Schmidt, W. H., 31, 46, 67 

Schneeweiss, 210, 362 

Schénfeld, 210, 317, 361 

Schulze, 75, 76, 92, 104 

Schwetlick, 274, 390 

Scott, 241, 245, 283, 325, 361 

Sen, 135, 196, 199, 201, 202 

Serfling, 135 

Shorack, 137 

Sickles, 83 

Siddk, 169, 173, 190, 418, 419, 424, 425, 
428, 429 

Sinclair, 214, 244, 392 

Smith, F. J., 214, 244, 392, 394 

Smith, H., 29 

Sobel, 246 


Solari, 241, 254, 267 

Southwell, 244 

Speckman, 125 

Spiegelman, 245, 310 

Sprent, 75, 93, 94, 210, 241, 243, 256, 362 

Sprott, 362 

Srivastava, 245 

Stein, 179, 196 

Stigler, 134, 137 

Stone, 112, 114, 123 

Strasser, 258, 289, 364 

Strawderman, 123 

Striby, 22 

Stuart, 210, 219, 231, 245, 246, 252, 255, 
265, 371, 388 

Studden, 89, 116 

Sugiura, 373 

Sundberg, 241, 254 

Susarla, 124 

Sylvestre, 26 


Takeuchi, 46 

Teissier, 254 

Thalheim, 85, 86 

Theil, 243, 245, 285, 317, 361 
Thomson, 245 

Tiller, 244, 396, 398 

Tinter, 242 

Toyoda, 83 

Tsybakov, 124 

Tukey, 134, 245, 246, 283, 353, 361 


Utreras, 116, 122, 124 
Uven, von, 241 


Van der Linde, 106 

Van Eeden, 140, 176, 179, 189, 190 
Veitch, 386 

Villegas, 242, 244, 246, 271, 291, 359 
Wahba, 116, 117, 122, 124 

Wald, 244, 245, 280, 281, 325, 364 
Walter, 124 

Ware, 240, 245, 279, 282, 362 
Watson, 121, 242, 246, 265, 266 
Watts, 45, 75, 104 

Weiss, 279, 285 

Willassen, 241, 267 

Williams, 248, 246 

Williamson, 214, 243, 391 
Wisotzkt, 22 

Witting, 258, 365, 421 

Wold, 89 


444 Author index 


Wolfowitz, 241, 245, 279, 283, 285, 310 Yefroimovich, 121, 124 
Wolter, 211, 244, 291, 378, 381, 384, 397 Yzeren, van, 245 

Wu, C.-F., 44 

Wu, D. M., 244 Zacks, 365, 427 


Zellner, 246, 314 
Yayatissa, 83 Zwanzig, 31, 46 


Subject index 


abrupt state switching, 78 

adaptation of nonlinear least squares 
algorithm, 149 

adaptation of nonlinear weighted least 
squares algorithm, 150 

adaptive rank estimator, 187 

adaptive smoothing, 124 

adequate model, 21 

algorithm H, 149 

algorithm 8, 150 

algorithms for WLSEH, 243 

algorithm W, 150 

«-test, asymptotic, 64, 425 

«-test, asymptotically most powerful, 169 

«-trimmed least squares estimator, 138 

«-trimmed means, 137 

approgression, 31 

— estimation, 31 

asymptotically efficient, 421 

— — estimator, 184 

— — Q,,-estimator, 347 

asymptotically most powerful «-test, 169 

asymptotically normal, 180 

asymptotically normally distributed se- 
quence of estimators, 49 

asymptotically P-equivalent vectors, 186 

asymptotic «-test, 64, 425 

asymptotic distribution 191, 193 

— — of WLSE, 39 

— — of WILSE, 39 

asymptotic efficiency, 63 

— — of the CIVE, 343 

— — of rank confidence intervals, 196 

asymptotic minimax constant, 121 

asymptotic minimax properties, 160 

asymptotic normality, 336f. 

— — of M-estimators, 152 

— — of R-estimators, 179 


asymptotic normality of WLSE in models 
with fixed experimental design, 367 

asymptotic optimality, 47 

— — of WLSE, 368 

asymptotic Q,,-estimator, 319, 345 

asymptotic relative efficiency, Pitman, 195 

asymptotics for MLE, 364 

asymptotics for WLSE, 365 

asymptotics under fixed experimental 
design, 363 

asymptotic variance, 160 


BAMSET, 85 

BAN sequence of estimators, 49 

Bartlett’s M Specification Error Test, 85 

Bayes’ inference, 246 

Behrens-Fisher problem, generalized, 83 

best asymptotically normally distributed 
sequence of estimators, see BAN 

best inhomogeneous linear unbiased esti- 
mator, see BILUE 

best linear unbiased estimator, see BLUE 

best unbiased estimator, see BUE 

better ordered permutations, 427 

BILUE, 432 

bivariate LIFU, 247 

— — with normally distributed incidental 
parameters, 242 

— — with replicated observations, 249 

bivariate LIFU+, 241 

bivariate LIFU-, numerics for, 392 

BLUE, 432 

boundary modification, 123 

bounded length confidence intervals, 199 

breakdown point, 135 

BUE, 433 


canonical IVE, see CIVE 
change index, 78 


446 Subject index 


change of state, 76 

— — —, continuous, 76 

change point, 76 

CIVE, 295, 301, 316, 433 

—, asymptotically efficient, 343 

—, consistency, 325 

Cobb-Douglas model, 23 

common state manifold, 229 

comparison of estimators, 371 

complete, essentially 427 

confidence intervals for regression coeffi- 
cients, 194 

confidence regions in LIFUt, 388 

consistency, 238, 424 

— of confidence regions, 389 

— of the CIVE of the structural para- 
meter, 325 

— of the OLSE, 376 

— of the variance estimator, 34, 100 

— of the WILSA, 34, 99 

— of the WILSH, 34, 100 

— of the WLSE, 34, 96 

—, strong, 239 

— theorem for MLE in models with fixed 
experimental design, 364 

—, weak, 59 

contaminating distribution, 141 

contiguous sequences of densities, 418 

contiguous sequences of probability distri- 
butions, 418 

continuous change of states, 75 

contrast function, 422 

cross-validation, 124 

cumulative sum test, 86 

— — —, simple, 86 

— — —, quadratic, 87 


design, point, 21 

deterministic functional model, 221 

difference of linearized rank estimators 
and R-estimators, 193 

difference of M- and R-estimators, 191 

distribution-free test, 164 

double-exponentially distributed random 
variable, 432 


efficiency, asymptotic, 63, 343, 421 
efficiency, Pitman asymptotic relative, 195 
ellipsoid, 118 

embedding into a model, 26 
e-contaminated normal distribution, 163 
e-contamination, 160 

errors-in-equations model, 209 


errors-in-variables model, 209 

essentially complete set of estimators, 427 

estimation, 24 

— of the regression parameters, 136 

— of the structural parameters, 369 

estimators under a nonrandom experimen- 
tal design, 238 

exogeneous variable, 227 

experimental design, 21, 218 

— —, identifying, 237 

explicit functional relation, 225 

explicit. functional relations with random 
incidental parameters, 231 

exponential model 23 

extended Kolmogorov inequality, 427 


Fisher information, 429 

functional model, 214 

— —, deterministic, 221 

functional relation, 209, 217, 223 

— —, explicit, 225 

— — with nonrandom 
design, 223 

— — with random incidental parameters, 
231 


experimental 


Gaussian white noise, 118 

Gauss-Newton iteration, 212 

Gauss-Newton procedure, 395 

generalized Behrens-Fisher problem, 83 

generalized least squares estimator, see 
GLSE 

GLSE, 24, 433 

— and MLH, 256 

— for the structural parameter in LIFUt, 
260 

GLSE in models with errors-in-variables, 
212 

grouping estimator, 280 

grouping method by Wald, 245 

growth curve, 22 


Haar measure, 430 
heteroscedastic variances, 24 
homoscedastic variances, 43 
hypothesis of randomness, 165 
hypothesis of symmetry, 165 


identifiability for nonlinear models, 237 
identifiability in LIFU+, 232 
identifiability, local, 377 

identifiability of a structural bundle, 276 


Subject index 


447 


Ma. wat. cs) ae 


identifiability of the structural parameter, 
236 

identifiability in the model, 301 

identifying experimental design, 237 

identifying extremal point, 276 

ill-posed problems, 109 

inadequate least squares approximation, 
25 

inadequate LSH, 25 

incidental parameters, 223, 225 

— —,in a distribution model, 324 

— —, infinitely many unknown, 324 

— —, random, 231 

inconsistency of OLSE, 218 

information matrix, 365 

inhomogeneous LIFU, 297 

instrumental variable, see [IV 

— — estimator, see IVE 

— — estimator, canonical, see CIVE 

internally identifiable model, 237 

interpolation spline, 121 

IV 281, 433 

IVE, 244, 433 

— in LIFU*+, 315 

— in LIFU-, 309 

— in models with errors-in variables, 213 


k-class estimator, 243, 285, 375 

Kepler model, 221 

kernel estimator, 110 

Kolmogorov inequality, 427 
Kolmogorov’s Specification Error Test, 85 
KOMSET, 85 


least squares approximation, 25 

— — —, weighted inadequate, see WILSA 
least squares estimator, see LSE 

— — —, generalized, see GLSE 

— — —, ordinary, see OLSE 

— — —, orthogonal, see ORLSE 

— — —, two-stage, see 2SLS-estimator 
— — —, weighted, see WLSE 

— — —, weighted inadequate, see WILSE 
Lebesgue theorem, 425 

lemma by Chow, 425 

lemma by Le Cam, 418 

lemma of Fatou, 425 

L-estimator, 137 

LIFU 215, 433 

—, bivariate, 247 

—, bivariate without replications, 253 

—, bivariate with replications, 249 

— with independent errors, 279 


LIFU with independent errors, IVE, 282 

— with independent errors, LIML, 283 

— with independent errors, OLSH, 279 

— with independent errors, ORLSE, 279 

— with independent errors, variance com- 
ponents, 283 

— with independent errors, 2SLS esti- 
mate, 283 

LUFU+, 225, 433 

—, equivariance of WLSEH, 263 

—, numerics, 390 

— under independent normally distribu- 
ted errors, 266 

— with known covariance, 258 

— with linear regression part, 264 

— with normally distributed errors, 241 

—, WLSE for, 259 

LIFU-, 433 

likelihood ratio, 419 

— — statistic, 84 

— — test, see LRT 

limit distribution of WLSE, 40 

limits of experiments, 109 

LIML, 283, 433 

— in LIFUt, 283 

— in models with errors-in-variables, 213 

linear filtering, 120 

linear functional relation, see LIFU 

— — — with nonrandom unobservable 
variables, see LIFUt+ 

with random 
design, see LIFU- 

— — — with random unobservable va- 
riables, see LIFU- 

linearization of a model, 26 

linearized rank estimator, 185, 193 

— — —, asymptotically efficients, 186 

— — —, asymptotically normally distri- 
buted, 186 

— — — of regression coefficients, 179 

linear rank statistic, 423 

linear regression model, 22 

linear smoothing methods, 109 

linear structural relation, 216 

local asymptotic minimax, 107 

locally asymptotically normal, 420 

locally most powerful «-test, 166 

locally most powerful rank «-test, 167 

locally most powerful signed-rank test, 169 

locally most powerful test, 167 

location submodel, 137 

logistic distribution, 420 

LRT, 69, 433 


experimental 


448 Subject index 


LRT, approximate, 72 
LSE, 24, 433 


Mallows’ C,-statistic, 105 

martingale, 426 

maximum likelihood estimator, see MLE 

— — — with limited information, see 
LIML 

maximum likelihood method, 160 

MCE, 258, 353, 375, 422, 433 

mean square error, see MSE 

measurability of MLE, 258 

measurability of WLSE, 258 

measure of concentration, 285 

M-estimator, 138, 141, 191 

method of cumulants by Geary, 245 

minimax property, asymptotic, 160 

minimax property of M-estimators, 142 

minimum contrast estimator, see MCE 

MLE, 24, 56, 278, 420, 433 

— and LSE, 256 

— and WLSEH, 256 

— for bivariate LIFU, 248 

— in LIFUt, 256, 303 

—- in models with errors-in-variables, 212 

—, measurability, 258 

— under unknown error covariance, 241 

— with limited information, see LIML 

MLS, 433 

model, adequate, 21 

—, exponential, 23 

— of contamination, 141 

— of indeterminacy, 141 

— ordered with abrupt state switching, 78 

— with a stochastic mechanism of assign- 
ment, 103 

— with continuous changes oi state 76, 96 

— with continuous state switching, 89 

— with errors-in-variables, 212 

— with generalized nonrandom IV, 321 

— with linear state function, 77 

— with nonrandom experimental design, 
223, 238 

— with nonrandom incidental parame- 
ters, 224 

— with replicated observations, 318 

modified Gauss-Newton procedure, 291, 
397 

modified k-class estimator, 286 

modified LSE in nonlinear models, 291 

modified MLE in LIFU, 285 

modified MLE in LIFU+, 373 

modified Newton-Raphson algorithm, 392 


Monte Carlo considerations, 135 
MSE, 433 

multicompartment systems, 23 
multivariate LIFU*, numerics, 392 
— — with known covariance, 259 


Newton Raphson algorithm, 392 

— — —, modified, 392 

Newton Raphson procedures, 397 

nonlinear model with increasing experi- 
mental design, 378 

nonparametric regression, 105 

nonparametric test, 164 

normal distribution, e-contaminated, 163 

normal model, 27 


OLSE, 24, 433 

—, bias in quadratic models, 243 

— in LIFU with independent errors, 279 
optimality theorem for MLE, 364 

order statistic, 164, 428 

ordinary estimator, 359 

ordinary LSE, see OLSE 

ORLSE, 257, 433 

overidentified likelihood equations, 255 


parameter function, identifiable in the 
model, 301 

P-equivalent vectors, asymptotically, 186 

piecewise polynomial, 113 

Pitman asymptotic relative efficiency, 195 

Pitman’s estimator, 134 

p-function, 146 


Qm-estimator, asymptotic, 345 
Q m-estimator, asymptotically efficient, 347 


rank, 423, 428 

rank confidence interval, 196 
rank estimator, adaptive, 187 
rank of a random vector, 164 
Rank Specification Error Test, 85 
rank statistic, 423 

rank test, 139, 164 

— —, asymptotic behaviour, 170 
RASET, 85 

rate of convergence, 108 
regressand, 21 

regression function, 21 

regression model, 217 

— —, linear, 22 

regression OLSE, 219 

Regression Specification Error Test, 85 


Subject index 


449 


To eeeeeeeeSSSSSSSSSSSSSSSSSSSSSsssssssssssee 


regressor, 21, 228 
reparametrization, 28 

RESET, 85 

R-estimator, 136, 163, 191, 193 
—, asymptotic normality, 179 
— of regression coefficients, 178 
risk function, 427 

robustness, 134 


sample median, 146 

score function, 139 

scoring method, 244 

sequence of estimators, best asymptotically 
normally distributed, 49 

set of admissible change indices, 79 

set of admissible change points, 77 

signed-rank statistic, simple linear, 169 

signed-rank test, 169 

— —, asymptotic behaviour, 170 

— —, uniform asymptotic linearity, 176, 
177 

sign statistic, 165, 428 

simple linear rank statistic, 169 

simple linear signed-rank statistic, 169 

simplified Gauss-Newton procedure, 396 

simultaneous equations 277 

smoothing, 110 

smoothness class, 108 

solution of the likelihood equation, see 
MLS 

Specification Error Test, 85 

spline regression, 89 

spline smoothing, 113 

state, 76 

— equation, 221 

— function, 76, 221 

— manifold, 221 

— manifold, common, 229 

— number, 76 

— parameter, 76 

— switching, abrupt, 78 

stochastic regressors, 217 

strongly consistent estimator, 239 

structural bundle, 221 

— —, identifiable by the critical points, 
276 

structural parameter in a distribution 
model, 325 

structural relation, 209 

structure, 221 

structure-identifiable model, 237 


structure parameter, 22i 

sum test, cumulative, 86 
system-describing variables, 221 
system equation, 221 

system function, 221 

system manifold, 229 

system parameter, 221 


testing the presence of a state switching, 82 
test on restrictions in the model, 385 

test on the dimension of the subspace, 385 
test on the existence of a LIFU, 387 
tests for model testing, 85 

theorem by Hunt and Stein, 427 
trimmed mean, 134 

truncated Fourier series, 110 
2SLS-estimator, 242, 433 

— in LIFU with independent errors, 283 
—, modified, 351 

two-stage estimator, 24 

two-stage LSE, see 2SLS-estimator 


unbiased linear estimator, 27 
uniformly consistent, 108 
union-intersection principle, 84 
updating technique, 81 


variances, heteroscedastic, 24 
—, homoscedastic, 43 


weak consistency, 59 

weakly convergent sequence of distribu- 
tions, 417 

weakly increasing size of the experimental 
design, 378 

weighted inadequate least squares appro- 
ximation, see WILSA 

weighted inadequate LSE, see WILSE 

weighted LSE, see WLSE 

Wilcoxon test, 139, 184, 199 

WILSA, 25, 433 

WILSE, 25, 433 

WLSE, 24, 256, 433 

—, approximate, 92 

— in LIFU+, 259 

— in models with 
switching, 90 

— in models with errors-in-variables, 212 

— in nonlinear models, 243 

—, measurability, 258 


continuous state 


Applied Probability and Statistics (Continued) 


DILLON and GOLDSTEIN . Multivariate Analysis: Methods and 
Applications 

eae : Sane alg Bae with Missing Data 

an. + Sampling Inspection Tab] be 

DOWDY and WEARDEN Note for eee sate pied 

DRAPER and SMITH - Applied Regression Analysis, Second Edition 

DUNN : Basic Statistic: A Primer for the Biomedical Sciences 
Second Edition : 

DUNN and CLARK - Applied Statistics: Analysis of Variance and 
Regression, Second Hdition 

ELANDT-JOHNSON and JOHNSON . Survival Models and Data 
Analysis 

FLEISS - Statistical Methods for Rates and Proportions, Second 
Edition 

FLEISS - The Design and Analysis of Clinical Experiments 

FLURY - Common Principal Components and Related Multivariate 
Models 

FOX - Linear Statistical Models and Related Methods 

FRANKEN, KONIG, ARNDT, and SCHMIDT - Queues and Point 
Processes 

GALLANT - Nonlinear Statistical Models 

GIBBONS, OLKIN, and SOBEL - Selecting and Ordering 
Populations: A New Statistical Methodology 

GNANADESIKAN - Methods for Statistical Data Analysis of 
Multivariate Observations 

GREENBERG and WEBSTER - Advanced Econometrics: A Bridge 
to the Literature 

GROSS and HARRIS - Fundamentals of Queueing Theory, Second 
Edition 

GROVES, BIEMER, LYBERG, MASSEY, NICHOLLS, and 
WAKSBERG - Telephone Survey Methodology 

GUPTA and PANCHAPAKESAN - Multiple Decision Procedures: 
Theory and Methodology of Selecting and Ranking Populations 

GUTTMAN, WILKS, and HUNTER - Introductory Engineering 
Statistics, Third Edition 

HAHN and SHAPIRO - Statistical Models in Engineering 

HALD - Statistical Tables and Formulas 

HALD - Statistical Theory with Engineering Applications 

HAND : Discrimination and Classification 

HEIBERGER - Computation for the Analysis of Designed 
Experiments 

HOAGLIN, MOSTELLER and TUKEY - Exploring Data Tables, 
Trends and Shapes 

HOAGLIN, MOSTELLER, and TUKEY ~- Understanding Robust 
and Exploratory Data Analysis 

HOCHBERG and TAMHANE - Multiple Comparison Procedures 

HOEL - Elementary Statistics, Fourth Edition 

HOEL and JESSEN - Basic Statistics for Business and Economics, 
Third Edition 

HOGG and KLUGMAN - Loss Distributions 

HOLLANDER and WOLFE - Nonpsrametric Statistical Methods 

IMAN and CONOVER - Modern Business Statistics 

JESSEN - Statistical Survey Techniques 

JOHNSON - Multivariate Statistical Simulation 


Applied Probability and Statistics (Continued) 


JOHNSON and KOTZ - Distributions in Statistics 
Discrete Distributions 
Continuous Univariate Distributions — 1 
Continuous Univariate Distributions— 2 
Continuous Multivariate Distributions 
JUDGE, HILL, GRIFFITHS, LUTKEPOHL and 
LEE - Introduction to the Theory and Practice of Econometrics, 
Second Edition 
JUDGE, GRIFFITHS, HILL, LUTKEPOHL and LEE .- The 
Theory and Practice of Econometrics, Second Edition 
KALBFLEISCH and PRENTICE - The Statistical Analysis of 
Failure Time Data 
KISH - Statistical Design for Research 
KISH - Survey Sampling 
KUH, NEESE, and HOLLINGER .- Structural Sensitivity in 
Kconometrie Models 
KEENEY and RAIFFA - Decisions with Multiple Objectives 
LAWLESS - Statistical Models and Methods for Lifetime Data 
LEAMER -: Specification Searches: Ad Hoe Inference with 
Nonexperimental Data 
LEBART, MORINEAU, and WARWICK - Multivariate Descriptive 
Statistical Analysis: Correspondence Analysis and Related 
Techniques for Large Matrices 
LINHART and ZUCCHINI - Model Selection 
LITTLE and RUBIN - Statistical Analysis with Missing Data 
McNEIL - Interactive Data Analysis 
MAGNUS and NEUDECKER - Matrix Differential Calculus with 
Applications in Statistics and Econometrics 
MAINDONALD : Statistical Computation 
MALLOWS : Design, Data, and Analysis by Some Friends of Cuthbert 
Daniel 
MANN, SCHAFER and SINGPURWALLA - Methods for 
Statistical Analysis of Reliability and Life Data 
MARTZ and WALLER - Bayesian Reliability Analysis 
MASON, GUNST, and HESS - Statistical Design and Analysis of 
Experiments with Applications to Engineering and Science 
MIKE and STANLEY - Statistics in Medical Research: Methods and 
Issues with Applications in Cancer Research 
MILLER - Beyond ANOVA, Basics of Applied Statistics 
MILLER - Survival Analysis 
MILLER, EFRON, BROWN, and MOSES - Biostatistics Casebook 
MONTGOMERY and PECK - Introduction to Linear Regression 
Analysis 
NELSON - Applied Life Data Analysis 
OSBORNE - Finite Algorithms in Optimization and Data Analysis 
OTNES and ENOCHSON - Applied Time Series Analysis: Volume I, 
Basic Techniques 
OTNES and ENOCHSON - Digital Time Series Analysis 
PANKRATZ .- Forecasting with Univariate Box-Jenkins Models: 
Concepts and Cases 
PLATEK, RAO, SARNDAL and SINGH - Small Area Statistics: An 
International Symposium 
POLLOCK - The Algebra of Econometrics 
RAOand MITRA - Generalized Inverse of Matrices and Its Applications 
RENYI - A Diary on Information Theory 


Applied Probability and Statistics (Continued ) 


RIPLEY - Spatial Statistics 

RIPLEY - Stochastic Simulation 

ROSS - Introduction to Probability and Statistics for Engineers and 
Scientists . 

ROUSSEEUW and LEROY ~- Robust Regression and Outlier 
Detection 

RUBIN - Multiple Imputation for Nonresponse in Surveys 

RUBINSTEIN - Monte Carlo Optimization, Simulation, and 
Sensitivity of Queueing Networks 

RYAN : Statistical Methods for Quality Improvement 

SCHUSS - Theory and Applications of Stochastic Differential 
Equations 

SEARLE - Linear Models 

SEARLE - Linear Models for Unbalanced Data 

SEARLE - Matrix Algebra Useful for Statistics 

SPRINGER - The Algebra of Random Variables 

STEUER - Multiple Criteria Optimization 

STOYAN - Comparison Methods for Queues and Other Stochastic 
Models 

STOYAN, KENDALL, and MECKE .- Stochastic Geometry and 
Its Applications 

THOMPSON - Empirical Model Building 

TIJMS - Stochastic Modeling and Analysis: A Computational 
Approach 

TITTERINGTON, SMITH, and MAKOV . Statistical Analysis of 
Finite Mixture Distributions 

UPTON - The Analysis fo Cross-Tabulated Data 

UPTON and FINGLETON - Spatial Data Analysis by Example, 
Volume I: Point Pattern and Quantitative Data 

UPTON and FINGLETON .- Spatial Data Analysis by Example, 
Volume II: Categorical and Directional Data 

VAN RIJCKEVORSEL and DE LEEUW - Component and 
Correspondence Analysis 

WEISBERG - Applied Linear Regression, Second Edition 

WHITTLE - Optimization Over Time: Dynamic Programming 
and Stochastic Control, Volume I and Volume II 

WHITTLE - Systems in Stochastic Equilibrium 

WILLIAMS - A Sampler on Sampling 

WONNACOTT and WONNACOTT - Econometrics, Second Edition 

WONNACOTT and WONNACOTT - Introductory Statistics, Fourth 
Edition 

WONNACOTT and WONNACOTT .- Introductory Statistics for 
Business and Economics, Third Edition 

WOOLSON :- Statistical Methods for The Analysis of Biomedical Data 


Tracts on Probability and Statistics 


AMBARTZUMIAN - Combinatorial Integral Geometry 

BIBBY and TOUTENBURG - Prediction and Improved Estimation 
in Linear Models 

BILLINGSLEY - Convergence of Probability Measures 

DEVROYE and GYORFI - Nonparametric Density Estimation: 
The L, View 

KELLY - Reversibility and Stochastic Networks 

RAKTOE, HEDAYAT, and FEDERER - Factorial Designs 

TOUTENBURG - Prior Information in Linear Models 


UNIVERSITY OF ILLINOIS-URBANA 


519.581297:E C002 vo02 
STATISTICAL METHODS OF MODEL BUILDING © 


ig 


